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INTRODUCTORY STATISTICS 



To Monique and Eloise 



Vrefdce 



Our objective has been to write a text that would come into the statistics 
market between the two texts written by Paul G. Hoel (or the two texts 
written by John E. Freund). We have tried to cover most of the material in 
their mathematical statistics books, but we have used mathematics only 
slightly mojre difficult than that used in their elementary books. Calculus is 
used only in sections where the argument is difficult to develop without it; 
although tips puts the calculus student at an advantage, we have made a 
special effort to design these sections so that a student without calculus can 
also follow 

By requiring a little more mathematics than many other elementary 
texts we have been able to treat many important topics normally covered only 
by books ifr mathematical statistics: for example, the relation of sampling 
and inference to the theory of probability and random variables. Another 
objective has been to show the logical relation between topics that have often 
appeared in texts as separate and isolated chapters: for example, the equiv- 
alence of interval estimation and hypothesis testing, of the t test and F test, 
and of analysis of variance and regression using dummy variables. In every 
case our motivation has been twofold: to help the student appreciate — indeed 
enjoy— the underlying logic, and to help him arrive at answers to practical 
problems, j 

We ha^e placed high priority on the regression model, not only because 
regression ijs widely regarded as the most powerful tool of the practicing 
statistician, jbut also because it provides a good focal point for understanding 
such related techniques as correlation and analysis of variance. 

Our or ginal aim was to write an introduction to statistics for economic 
students, but as our efforts increased, so it seems did our ambitions. Ac- 



but 

cordingly, t|iis book is now written for students in economics and other social 
sciences, for business schools, and for service courses in statistics provided 
by mathematics departments. Some of the topics covered are typically 
omitted from introductory courses, but are of interest to such a broad 
r example, multiple comparisons, multiple regression, Bayesian 



audience: fc 
decisions, a id game theory. 



vn 
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PREFACE 



A statistics text aimed at several audiences — including students with and 
without calculus — raises major problems of evenness and design. The text 
itself is kept simple, with the more difficult interpretations and developments 
reserved for footnotes and starred sections. In all instances these are optional ; 
a special effort has been made to allow the more elementary student to skip 
these completely without losing continuity. Moreover, some of the finer 
points are deferred to the instructor's manual. Thus the instructor is allowed, 
at least to some degree, to tailor the course to his students' background. 

Problems are also starred (*) if they are more difficult, or set with an 
arrow (=>) if they introduce important ideas taken up later in the text, or 
bracketed ( ) if they duplicate previous problems, and thus provide optional 
exercise only. 

Our experience has been that this is about the right amount of material 
for a two-semester course; a single semester introduction is easily designed 
to include the first 7, 8, or 9 chapters. We have also found that majors in 
economics who may be pushed a bit harder can cover the first 10 chapters in 
one semester. This has allowed us in the second semester to use our forth- 
coming Econometrics text which provides more detailed coverage of the 
material in Chapters 11 to 15 of this book, plus additional material on serial 
correlation, identification, and other econometric problems. 

So many have contributed to this book that it is impossible to thank 
them all individually. However, a special vote of thanks should go, without 
implication, to the following for their thoughtful reviews: Harvey J. Arnold, 
David A. Belsley, Ralph A. Bradley, Edward Greenberg, Leonard Kent, 
R. W. Pfouts, and especially Franklin M. Fisher. We are also indebted to our 
teaching assistants and the students in both mathematics and economics at 
the University of Western Ontario and Wesleyan (Connecticut) who suggested 
many improvements during a two-year classroom test. 



London, Ontario, Canada 
September, 1968 



Thomas H. Wonnacott 
Ronald J. Wonnacott 
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chapter I 

Introduction 



The word "statistics" originally meant the collection of population and 
economic information vital to the state. From that modest beginning, 
statistics has grown into a scientific method of analysis now applied to all 
the social and natural sciences, and one of the major branches of mathe- 
matics. The ^resent aims and methods of statistics are best illustrated with 
a familiar example. 



1-1 EXAMPLE 

Before every presidential election, the pollsters try to pick the winner; 
specifically, Ihey try to guess the proportion of the population that will 
vote for each candidate. Clearly, canvassing all voters would be a hopeless 
task. As the :>nly alternative, they survey a sample of a few thousand in the 
hope that the sample proportion will be a good estimate of the total popula- 
tion proportion. This is a typical example of statistical inference or statistical 
induction: the (voting) characteristics of an unknown population are inferred 
from the (voting) characteristics of an observed sample. 

As any pollster will admit, it is an uncertain business. To be sure of the 
population, one has to wait until election day when all votes are counted. 
Yet if the sampling is done fairly and adequately, we can have high hopes 
that the sample proportion will be close to the population proportion. This 
allows us to estimate the unknown population proportion 77 from the ob- 
served sample proportion (P), as follows: 



7T = P ± a small error 



(i-i) 



with crucial questions being, "How small is this error?" and "How sure are 
we that we a*e right?" 
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Since this typifies the very core of the book, we state it more precisely 
in the language of Chapter 7 (where the reader will find the proof and a 
fuller understanding). 

If the sampling is random and large enough, we can state with 95% 
confidence that 



where n and P are the population and sample proportion, and n is the sample 
size. 

As an illustration of how this formula works, suppose we have sampled 
1,000 voters, with 600 choosing the Democratic candidate. With this sample 
proportion of .60, equation (1-2) becomes 



Thus, with 95% confidence, we estimate the population proportion voting 
Democrat to be between .57 and .63. 

This is referred to as a confidence interval, and making estimates of this 
kind will be one of our major objectives in this book. The other objective 
is to test hypotheses. For example, suppose we wish to test the hypothesis 
that the Republican candidate will win the election. On the basis of the 
information in equation (1-3) we would reject this claim; it is no surprise 
that a sample result that pointed to a Democratic majority of 57 to 63% of 
the vote will also allow us to reject the hypothesis of a Republican victory. 
In general, there is a very close association of this kind between confidence 
intervals and hypothesis tests; indeed, we will show that in many instances 
they are equivalent procedures. 

We pause to make several other crucial observations about equation 



1, The estimate is not made with certainty; we are only 95% confident. 
We must concede the possibility that we are wrong — and wrong because we 
were unlucky enough to draw a misleading sample. Thus, even if less than 
half the population is in fact Democratic, it is still possible, although un- 
likely, for us to run into a string of Democrats in our sample. In such circum- 
stances, our conclusion (1-3) would be dead wrong. Since this sort of bad 
luck is possible, but not likely, we can be 95% confident of our conclusion. 

2. Luck becomes less of a factor as sample size increases; the more 
voters we canvass, the less likely we are to draw a predominantly Democratic 




(1-2) 




or approximately 



n = .60 ± .03 



(1-3) 



(1-3). 
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sample from] a Republican population. Hence, the more precise our predic- 
tion. Formally, this is confirmed in equation (1-2); in this formula we note 
that the error term decreases with sample size. Thus, if we increased our 
sample to 10,000 voters, and continued to observe a Democratic proportion 
of .60, our 9,5% confidence interval would become the more precise: 

.60 ± .01 (1-4) 

3. Suppose our employer indicates that 95% confidence is not good 
enough. "Come back when you are 99% sure of your conclusion." We now 
have two options. One is to increase our sample size; as a result of this 
additional cdst and effort we will be able to make an interval estimate with 
the precision; of (1-4) but at a higher level of confidence. But if the additional 
resources for further sampling are not available, then we can increase our 
confidence only by making a less precise statement — i.e., that the proportion 
of Democrats is 

.60 ± .02 

The less we commit ourselves to a precise prediction, the more confident 
we can be that we are right. In the limit, there are only two ways that we 
can be certain of avoiding an erroneous conclusion. One is to make a state- 
ment so imprecise that it cannot be contradicted. 1 The other is to sample the 
whole population 2 ; but this is not statistics — it is just counting. Meaningful 
statistical conclusions must be prefaced by some degree of uncertainty. 



1-2 INDUCTION AND DEDUCTION 

Figure Tj-1 illustrates the difference between inductive and deductive 
reasoning. Induction involves arguing from the specific to the general, or 
(in our case) from the sample to the population. Deduction is the reverse — 
arguing frorr the general to the specific, i.e., from the population to the 
sample, 3 Equation (1-1) represents inductive reasoning; we are arguing from 
a sample proportion to a population proportion. But this is only possible 

1 E.g., 77 = ~5ojdb .50. 

x 2 Or, almost thje whole population. Thus it would not be necessary to poll the whole 
population to determine the winner of an election; it would only be necessary to continue 
one candidate comes up with a majority. (It is always possible, of course, 
> e change their mind between the sample survey and their actual vote, but 
ith this issue here.) 



canvassing unti 
that some peop < 
we don't deal wi 

3 The student an easily keep these straight with the help of a little Latin, and recognition 
that the population is the point of reference. The prefix in means "into" or "towards.'* 
Thus mduction s arguing towards the population. The prefix de means "away from." Thus 
deduction is arguing away from the population. Finally, statistical wference is based 
on mduction. 
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(b) 

FIG. 1-1 Induction and deduction contrasted, (a) Induction (statistical inference). 

(b) Deduction (probability). 



if we study the simpler problem of deduction first. Specifically, in equation 
(1-1), we note that the inductive statement (that the population proportion 
can be inferred from the sample proportion) is based on a prior deduction 
(that the sample proportion is likely to be close to the population proportion). 
Chapters 2 through 5 are devoted to deduction. This involves, for 
example, the study of probability, which is useful for its own sake, (e.g., in 



HOW TO SAMPLE 5 

Game Theory); but it is even more useful as the basis for statistical induction 
dealt with in Chapters 7 through 10. In short, in the first 6 chapters we ask, 
"With a given population, how will a sample behave? Will the sample be 
'on target'?" Only when this deductive issue is resolved can we move to 
questions or statistical inference. This involves, in the later chapters, turning 
the argument around and asking "How precisely can we make inferences 
about an ur known population from an observed sample?" 

1-3 WHY SAMPLE? 

We sample, rather than study the whole population, for any one of 
three reasons. 

(1) Lirriited resources. 

(2) Lirriited data available. 

(3) Destructive testing. 

1. Limi:ed resources almost always play some part. Tn our example of 
preelection polls, funds were not available to observe the whole population; 
but this is nit the only reason for sampling. 

2. Sometimes there is only a small sample available, no matter what cost 
may be incurred. For example, an anthropologist may wish to test the theory 
that the two civilizations on islands A and B have developed independently, 
with their own distinctive characteristics of weight, height, etc. But there is 
no way in which he can compare the two civilizations in toto. Instead he 
must make ai inference from the small sample of the 50 surviving inhabitants 
of island A ^nd the 100 surviving inhabitants of island B. The sample size 
is fixed by nature, rather than by the researcher's budget. 

There ajre many examples in business. An allegedly more efficient 
machine ma)} be introduced for testing, with a view to the purchase of addi- 
tional similar units. The manager of quality control simply cannot wait 
around to observe the entire population this machine will produce. Instead 
a sample runj must be observed, with the decision on efficiency based on an 
inference fro In this sample. 

3. Sampling may involve destructive testing. For example, suppose we 
have produced a thousand light bulbs and wish to know their average life. 
It would be folly to insist on observing the whole population of bulbs until 
they burn out. 

1-4 HOW TO SAMPLE 



In statistics 
distinguish between 



as in business or any other profession, it is essential to 
bad luck and bad management. For example, suppose a 
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man bets you S100 at even odds that you will get an ace (i.e., 1 dot) in rolling 
a die. You accept the challenge, roll an ace, and he wins. He's a bad manager 
and you're a good one; he has merely overcome his bad management with 
extremely good luck. Your only defense against this combination is to get 
him to keep playing the game — with your dice. 

If we now return to our original example of preelection polls, we note 
that the sample proportion of Democrats may badly misrepresent the 
population proportion for either (or both) of these reasons. No matter how 
well managed and designed our sampling procedure may be, we may be 
unlucky enough to turn up a Democratic sample from a Republican popula- 
tion. Equation (1-2) relates to this case; it is assumed that the only complica- 
tion is the luck of the draw, and not mismanagement. From that equation 
we confirm that the best defense against bad luck is to "keep playing"; by 
increasing our sample size, we improve the reliability of our estimate. 

The other problem is that sampling can be badly mismanaged or biased. 
For example, in sampling a population of voters, it is a mistake to take their 
names from a phone book, since poor voters who often cannot afford 
telephones are badly underrepresented. 

Other examples of biased samples are easy to find and often amusing. 
"Straw polls" of people on the street are often biased because the interviewer 
tends to select people that seem civil and well dressed; the surly worker or 
harassed mother is overlooked. A congressman can not rely on his mail as 
an unbiased sample of his constituency, for this is a sample of people with 
strong opinions, and includes an inordinate number of cranks and members 
of pressure groups. 

The simplest way to ensure an unbiased sample is to give each member 
of the population an equal chance of being included in the sample. This, in 
fact, is our definition of a "random" sample. 4 For a sample to be random, 
it cannot be chosen in a sloppy or haphazard way; it must be carefully 
designed. A sample of the first thousand people encountered on a New York 
street corner will not be a random sample of the U.S. population. Instead, 
it is necessary to draw some of our sample from the West, some from the 
East, and so on. Only if our sample is randomized, will it be free of bias and, 
equally important, only then will it satisfy the assumptions of probability 
theory, and allow us to make scientific inferences of the form of (1-2). 

In some circumstances, the only available sample will be a nonrandom 
one. While probability theory often cannot be strictly applied to such a 
sample, it still may provide the basis for a good educated guess — or what we 
might term the art of inference. Although this art is very important, it cannot 
be taught in an elementary text; we, therefore, consider only scientific 

4 Strictly speaking, this is called "simple random sampling," to distinguish it from more 
complex types of random sampling. 
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based on the assumption that samples are random. The techniques 
ensunhg this are discussed further in Chapter 6. ^uniques 
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chapter 2 

Descriptive Statistics for Samples 



2-1 INTRODUCTION 

We have already discussed the primary purpose of statistics — to make 
an inference to the whole population from a sample. As a preliminary step, 
the sample must be simplified, and reduced to a few descriptive numbers; 
each is called a sample statistic} 

In the very simple example of Chapter 1, the pollster would record the 
answers of the 1000 people in his sample, obtaining a sequence such as 
D D R D R . . . . where D and R represent Democrat and Republican. The 
best way of describing this sample by a single number is the statistic P, the 
sample proportion of Democrats; this will be used to make an inference 
about 77, the population proportion. Admittedly, this statistic is trivial to 
compute. In the sample of the previous chapter, computing the sample 
proportion (.60) required only a count of the number voting Democrat 
(600), followed by a division by sample size, (n = 1,000). 

We now turn to the more substantial computations of statistics to 
describe two other samples. 

(a) The results when a die is thrown 50 times. 

(b) The average height of a sample of 200 American men. 

2-2 FREQUENCY TABLES AND GRAPHS 

(a) Discrete Example 

Each time we toss the die, we record the number of dots X, which takes 
on the values 1, 2, . . . , 6. X is called a "discrete" random variable because 
it assumes only a finite (or countably infinite) number of values. 

1 Later, we shall have to define a statistic more rigorously; but for now, this will suffice. 
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Table 2-1 Results of Tossing a Die 50 Times 
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6,2,2,3,5, 1,2, 6, 4,2. 



The 50 throws yield a string of 50 numbers such as given in Table 2-1. 

To simplify, we keep a running tally of each of the six possible outcomes 
in Table 2-2| In column 3 we note that 9 is the frequency / (or total number 
of times) that we rolled a 1 ; i.e., we obtained this outcome on 9/50 of our 
tosses. Formally, this proportion (.18) is called relative frequency (fin); it 
is computed in column 4. 



Table 2- 



Calculation of the Frequency, and Relative Frequency of the 
Number of Dots in 50 Tosses of a Die 



(1) (2) 
Dots 



Number of Dots 



Tally 



(3) 

Frequency (/) 



(4) 

Relative Frequency 
(fin) 



1 
2 
3 
4 
5 
6 



mi 1 1 1 1 
mi mj 
mi i 
mi mi 
mi 

PN4 



9 
12 
6 
8 

10 
5 



.18 
.24 
.12 
.16 
.20 
.10 



J2~ 



where 2 /is " tne sum °^ alJ /" 



2 (/to - i.oo 



The information in column 3 is called a "frequency distribution," and 
is graphed inj Figure 2-1. The "relative frequency distribution" in column 4 
can be similarly graphed; the student who does so will note that the two 
graphs are identical except for the vertical scale. Hence, a simple change of 
vertical scale transforms Figure 2-1 into a relative frequency distribution. 
This now gives us an immediate picture of the sample result. 



(b) Continuous Example 

Suppose that a sample of 200 men is drawn from a certain population, 
with the heignt of each recorded in inches. The ultimate aim will be an in- 
ference about the average height of the whole population; but first we must 
efficiently summarize and describe our sample. 
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Frequency 



15 



10 



Relative 
frequency 



30% 



■20% 



-10% 



FIG. 2-1 



0 1 2 3 4 5 6 
Number of dots 

Frequency and relative frequency distribution of the results of a sample of 50 
tosses of a die. 



In this example, height (in inches) is our random variable X. In this 
case, X is continuous; thus an individual's height might be any value, such 
as 64.328 inches. 2 It no longer makes sense to talk about the frequency of 
this specific value of X; chances are we'll never again observe anyone exactly 
64.328 inches tall. Instead we can tally the frequency of heights within a 

Table 2-3 Frequency, and Relative Frequency of the Heights of a Sample 

of 200 Men 





(1) 


(2) 


(3) 




(4) 


(5) 














Relative 


Cell 


Cell 


Cell 




Frequency, 


Frequency 


No. 


Boundaries 


Midpt 


Tally 




f 


fin 


1 


55.5-58.5 


57 


II 




2 


.010 


2 


58.5-61.5 


60 


TUi li 




1 


.035 


3 


61.5-64.5 


63 




1 


22 


.110 


4 




66 






13 


.065 


5 




69 






44 


.220 


6 




72 






36 


.180 


7 




75 






32 


.160 


8 




78 






13 


.065 


9 




81 






21 


.105 


10 


82.5-85.5 


84 






10 


.050 












= 200 = 


n 2 fin « 1.00 



2 We shall overlook the fact that although height is conceptually continuous, in practice 
the measured height is rounded to a few decimal places at most, and is therefore discrete. 
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II 



class or cell! (e.g., 58.5" to 61.5") as in column 3 of Table 2-3. Then the 
frequency and relative frequency are tabulated as before. 

The cells have been chosen somewhat arbitrarily, but with the following 



conveniences in mind. 



1. The 
detail and 

2. Each 



too 



number of cells is a reasonable compromise between too much 
little. 

cell midpoint, which hereafter will represent all sample values 
in the cell, i$ a convenient whole number. 



The grouping of the 200 observations into cells is illustrated in Figure 
2-2, where e^ich observation is represented by a dot. For simplicity, we have 
assumed that the observations are recorded exactly, rather than being 



"i — i i i i r 

57 60 63 66 69 72 



n — i — i — r 

75 78 81 84 



Height 



FIG. 2-2 The grouping of observations into cells, illustrating the first two columns of 

Table 2-3. 

rounded off.: (Rounding, to the nearest integer, for example, may in fact be 
regarded as a preliminary grouping into cells of width I.) 

The grouped data is then graphed in Figure 2-3. This frequency dis- 
tribution, orj so-called histogram, uses bars to represent frequencies as a 
reminder that the observations occurred throughout the cell, and not just 
at the midpoint. 

We novy turn to the question of how we may characterize a sample 
frequency di$tribution with a single descriptive measure, or sample statistic. 



60 

>> 50 

1 40 
f 

^ 30 

20 
10 
0 



30% 



2b%£ 



lp%' 



J__LL_Li 



FIG. 2-3 The 



57 60 63 66 69 72 75 78 81 84 
Height 

frequency and relative frequency distribution of a sample of 200 men. 
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In fact, there are two highly useful descriptions: the first is the central point 
of the distribution and the second is its spread. 



2-3 CENTERS (MEASURES OF LOCATION) 

There are several different concepts of the "center" of a frequency 
distribution. Three of these, the mode, the median, and the mean, are 
discussed below. We shall start with the simplest. 



(a) The Mode 3 

This is defined as the most frequent value. In our example of heights, 
the mode is 69 inches, since this cell has the greatest frequency, or highest 
bar in Figure 2-3. Generally, the mode is not a good measure of central 
tendency, since it often depends on the arbitrary grouping of the data. 
(The student will note that, by redefining cell boundaries, the mode can be 
shifted up or down considerably.) It is also possible to draw a sample where 
the largest frequency (highest bar in the group) occurs at two (or even more) 
heights; this unfortunate ambiguity is left unresolved, and the distribution 
is "bimodal." 



(b) The Median 

This is the 50th percentile; i.e., the value below which half the values 
in the sample fall. Since it splits the observations into two halves, it is some- 
times called the middle value. In the sample of 200 shown in Figure 2-2, 
the median (say, 71.46) is most easily derived by reading off the 100th value 4 
from the left; but if the only information available is the frequency distribu- 
tion in Figure 2-3, it must be calculated choosing an appropriate value 
within the median cell. 5 

3 "Mode" means fashion, in French. 

4 Or 101st value. This ambiguity is best resolved by defining the median as the average of 
the 100th and 101st values. In a sample with an odd number of observations, this ambiguity 
does not arise. 

5 The median cell is clearly the 6th, since this leaves 44% (i.e., 88) of the sample values 
below and 38% (i.e., 76) above. The median value can be closely approximated by moving 
through this median cell from left to right to pick up another 6% of the observations. 
Since this cell includes 18% of the observations, we move 6/18 of the way through this 
cell. Thus our median approximation is 70.5 -f (6/18 x 3) = 71.5. 



(c) The Mean (X) 



This is 
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sometimes called the arithmetic mean, or simply the average. 
This is the nost common central measure. The original observations (X u 
X 2 , . . . , ZJiare simply summed, then divided by n. Thus 



x 4 1 ( Xl + x 2 + 

n 



+ X n ) 



Definition 



(2-1 a) 



where X t reprdsents the /th value of X, and = means "equals, by definition." 

The average height of our sample could be computed by summing all 
200 observations and dividing by 200. However, this tedious calculation can 
be greatly simplified by using the grouped data in Table 2-3. Let/i represent 
the number Af observations in cell 1, where each observation may be ap- 
proximated 6 py the cell midpoint, x x . Similar approximations hold for all 
the other cells too, so that 



1 [(^ 4- x x + • • • + x x ) + 0 2 + #2 + * * • x 2 ) 



f t limes 



/ 2 times 



+ ' * * + (#10 + ' " * ^io) 
f l0 times 



where ~ represents approximate equality; it follows that 

n 



In genera 



n n 



./« 



X 



(2-1 b) 



6 In approximating each observed value by the midpoint of its cell, we sometimes err 
positively, sometjmes negatively; but unless we are very unlucky, these errors will tend 
to cancel. Even in the unluckiest case, however, the error must be smaller than half the cell 



width. Note that 
the observed valu 



cell midpoints are designated by the small x iy to distinguish them from 
ssX. " " ' " 



CARNEGIE-MELLON UfMEJS'TJ 
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Table 2-4 Calculation of Mean and Mean Squared Deviation of a Sample of 200 Men's Heights a 



Calculation of X 



Easier Calculation of 



Given 


using (2- lb) 


Calculation of MSD using 2-5 (b), and s 2 


s 2 using (2-7b) 


(1) (2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


filn 




to " X) 


to - x? 


to - Xf(fjn) 




Ah 


57 .010 


.570 


-14.80 


218. 


2.18 


3,249 


6,498 


60 .035 


2.100 ; 


-11.80 


139. 


4.88 


3,600 


25,200 


63 .110 


6.930 


-8.80 


77. 


8.47 


3,969 


873,180 


84 .050 


4.200 


+ 12.20 


149. 


7.45 


7,056 


70,560 



X~I x i{f) I to " Af5/> « I to - 



71.80 



- 0 



40.0 



Comparing (2-5a) and (2-6) 

s* = MSD (^) 
/200\ 

-*(t55) - 



2^ = 1,039,000 
nX 2 = 1,031,000 

52 = 1^9 {8 '° 00} 



= 40.2 



40.2 



= V40.2 = 6.35 



a Warning. This computation is very tedious. Just verify a few calculations rather than all the detail, since an easier method is 
shown later in Table 2-5. 



W »fi , IIHIH li If; .jjifc |.- .f§I.r ; f :f 'ji ; .i| E . , i 



life ■ 
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where (fjn)\= relative frequency in the ith cell, and m = number of cells. 
We number! this equation (2-1 b) to emphasize that it is the equivalent 
formulation of (2-la), appropriate for grouped data. In our example, the 
calculation of (2- lb) is based on the data in Table 2-3, and is shown in 
column 3 of Table 2-4. We can think of this as a "weighted" average, with 
each x value weighted appropriately by its relative frequency. 



(d) Comparison of Mean, Median and Mode 

These three measures of center are compared in Figure 2-4. In part a we 
show a distribution which has a single peak and is symmetric (i.e., one half 
is the mirror image of the other); in this case all three central measures 
coincide. But when the distribution is skewed to the right as in b, the median 



(a) 



JL 



Mode 

Median 

Mean 



X 



FIG. 2-4(a) A 
coincide at the 



Mode If X 
Median I 
Mean 

ymmetric distribution with a single peak. The mode, median, and mean 
point of symmetry, (b) A right-skewed distribution, showing mode < 
median < mean. 
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falls to the right of the mode; with the long scatter of observed values strung 
out in the right hand tail, it is generally necessary to move from the mode 
to the right to pick up half the observations. Moreover, the mean will generally 
lie even further to the right, as explained in the next section. 

Interpreting the Mean by an Analogy from Physics. The 200 observations 
in the sample of heights appear in Figure 2-2 as points along the X-axis. If 
we think of these observations as masses (each observation a one pound 
mass, for example), and the X-axis as a weightless supporting rod, we might 
ask where this rod balances. Our intuition suggests "the center." 

The precise balancing point, also called the center of gravity, is given by 
the formula 

n 

which is exactly the formula for the mean. Thus we are quite justified in 
thinking of the sample mean as the "balancing point" of the data, and 
representing it in graphs as a fulcrum ±. 

It can easily be seen why the mean lies to the right of the median in a 
right-skewed distribution, as shown in Figure 2-46. Experiment by trying to 
balance at the median. Fifty percent of the observed values now lie on either 
side, but the observations to the right tend to be further distant, tilting the 
distribution to the right. Balance can be achieved only by placing the fulcrum 
(mean) to the right of the median. 

PROBLEMS 

2-1 Show the mean, mode, and median in our example in Figure 2-3. Is the 

mode a good central measure in this case? 
2-2 Find the mean, median, and mode of the following sample of litter sizes. 

Graph the frequency distribution. 

7 4 10 9 15 12 7 

8 11 4 14 10 5 14 
1 10 8 12 6 5 

2-3 Sort the following data into 8 cells, whose midpoints are 55, 60 . . . 90. 

55.31 81,47 64.90 70.88 86.02 77.25 76.73 84.21 56.02 
84.92 90.23 78.01 88.05 73.37 87.09 57.41 85.43 
74.76 86.51 86.37 76.15 88.64 84.71 66.05 83.91 
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Approximately what are the mean, median, and mode? Graph the 
frequency distribution. 

2-4 Sort the cata of Problem 2-3 into 4 cells, whose midpoints are 60, 70, 

80, 90. Then answer the same questions as in Problem 2-3. 
2-5 Summarize the answers to the previous two problems in the following 



table. 



Mean Median 



Mode 



Original data 
(exact values) 

— Fine grouping 
(Problem 2-3) 

— Coarse grouping 
(Problem 2-4) 



77.78 



81.47 



Not defined 



\ 

(a) Do you see why the mode is not a good measure? 

(b) Will coarse grouping always give worse approximations (for the 
mean and median) than fine grouping, or will it do so usually? 



2-4 DEVIATIONS (MEASURES OF SPREAD) 



Although pverage height may be the most important characteristic 
(statistic) of the^ sample, it is also important to know how spread out or varied 
are the sample observations. 

As with measures of center, we find that there are several measures of 
spread; we star : with the simplest. 



(a) The rahge is simply the distance between the largest and smallest 



value. 



Range = largest-smallest observation 



For men's heights, the range is 30. It may be fairly criticized on the grounds 
that it tells us nithing about the distribution except where it ends. And these 
two extreme val les may be very unreliable. We therefore turn to measures of 
spread which ta ce account of all observations. 

The average deviation, as its name implies, is found by calculating the 
deviation of eacjh observed value (X { ) from the mean (X); these deviations 
{X i — X) are then averaged by summing and dividing by n. Although this 
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sounds like a promising measure, in fact it is worthless; positive deviations 
always cancel negative deviations, leaving an average of zero. 7 This sign 
problem can be avoided by ignoring all negative signs and taking the average 
of the absolute values of the deviations, as follows. 



(b) The Mean Absolute Deviation A - J \X i - X\ 



(2-4) 



Intuitively, this is a good measure of spread; the problem is that it is mathe- 
matically intractable. 8 We therefore turn to an alternative means of avoiding 
the sign problem — namely, squaring each deviation. 



(c) 



1 £ 



Mean Squared Deviation (MSD) = - 2 - x f 



(2-5a) 



If we use grouped data as in Table 2-4, this formula becomes 

- — I f\ 

Mean Squared Deviation ~ £ — A') 2 ) — 



(2-5b) 



This is a good measure, provided we wish only to describe the sample. But 
typically we shall want to go one step further, and use this to make a statistical 
inference about the population. For this purpose it is better to use the divisor 
n — 1 rather than 9 n. The resulting sample statistic is referred to as the vari- 
ance. 



(d) 



Variance, s 2 = — — £ (X t - X) 



7 This may be proved as follows: 
A I « 

Average deviation = - T (A", — X) 
» 1 3 i 



(2-6) 



(2-2) 



« — w 

= J- ^= 0 

Average deviation = 0 (2-3) 

8 One difficulty is the problem of differentiating the absolute value function. 

9 Technically, this makes the sample variance an unbiased estimator of the population 
variance. See Chapter 7. 



LINEAR TRANSFORMATIONS 



19 



The values of MSD and s 2 are calculated in Table 2-4, again exploiting 
the simplicities of grouped data. 10 

(e) Standard deviation is s, the square root of the variance. 



Standard Deviation, s = 



2 (*i - Xf 



(2-10) 



Note that by taking the square root, we compensate for having squared 
terms in defining variance in (2-6), so that s is reduced to the same units as the 
X observations. 

In conclusion, the sample mean X is the most common measure of center, 
and the samrjle standard deviation s the most common measure of spread. 



Borrowing th 



language of physics, we refer to X and s 2 as the first and 



second moments of the sample. 
PROBLEMS^ 

2-6 Compute the variance of the data of Problem 2-2: 

(a) Using the definition (2-6). (b) Using the easier formula (2-7). 

2-7 For the gtjouped data of Problem 2-4, compute the range, mean absolute 
deviation, anc standard deviation. 

(2-8) 1 For the grouped data of Problem 2-3, compute the standard deviation. 

parentheses are optional, since they closely parallel problems the student 



1 Problems with 
has already done 



2-5 LINEAR TRANSFORMATIONS (CODING) 

(a) Change of ' Origin 

Suppose fiat the men's heights in our example are measured relative to 
a "norm" of 6? inches, (i.e., 5 feet 9 inches). Since X t denotes the old height 



10 Strictly to simplify the calculations, s 2 is often computed from the following formula. 

^=^riE^-" F ] (2 - 7a) 



For grouped data 



This computation 



this becomes 



n 



(2-7b) 



is shown in the last two columns of Table 2-4. 



Proof that (2-6) Jam/ (2-7a) are equivalent. We may forget the common divisor {n 1), 



and merely prove 



i 



The left side is 2 \X t - J?) 2 = £ (Xf - !X t X + X 2 ) 

= 2 Xf - 2X(nX) + nX 2 
= 2 Xf - «J 2 = right side 



(2-8) 
(2-9) 
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in inches (eg., 83), let X\ denote the new measure (e.g., 14). The two measures 
are related by the equation 

jr;«jr,-69 (2-1 1) 

In nonmathematical terms, this new measurement is simply "the number of 
inches an individual is taller (+) or shorter (— ) than 69 inches. It is easy to 
guess that the mean using this new measure is just 69 less than the mean using 
the old measure, i.e.: 

X' = X - 69 (2-12) 

On the other hand, the spread of our observations will be exactly the same, 
regardless of which measurement is used, i.e.: 

s x . = s x (2-13) 



0 * NewX-aXi-69 

FIG. 2-5 Change of origin (shift). 

These two points are illustrated in Figure 2-5 and may be stated in theorem 
form as follows: 11 



11 Proof: To prove (2-15) consider 
but from (2-14) 



~ -2 

= -(J t X i )--(na) 



« X-a 

To prove (2-16) it will be enough to prove the equality of variances. 

" 1 i=l 

By (2-14) and (2-15) 



Theorem 1 



(b) Change of Scale 
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If X! -Xt-a 

then X' = X- a 
and s x . = s x 
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(2-14) 
(2-15) 
(2-16) 



A. denote the new height in trintals, an old height of 



%i = 81 inches 



It is no surprise 
inches, i.e. : 



Furthermore, a: 
be 1/3 as much 



These two points 
Theorem II 



The proof parallels 
for the student 



Old x x 



/ 

/ 



FIG. 2-6 Change of scale (shrink), 
would be converted to X* = 27 trintals, and generally 



* = & 



X* = 



(2-17) 



that the mean height in trintals is just 1/3 the mean height 



in 



X* = IX 



(2-18) 



as a before een ^ ^ " ^ deviati ° n wiI1 also 



can be stated generally as 



then X* = bX 
and a y . = \b\ s 2 



that of Theorem I directly above, and is left 



(2-19) 



(2-20) 
(2-21) 
(2-22) 

as an exercise 
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(c) General Linear Transformations 

It is now appropriate to combine the above two theorems into one. 
Consider the general linear 12 transformation: 

Theorem HI 



Again the proof is similar to Theorem I above, and is left as an exercise 
for the student. This theorem may be interpreted very simply : if the individual 
observations (X t ) are linearly transformed (into corresponding Y t values), 
then the mean observation is transformed in exactly the same way, and the 
standard deviation is stretched (or shrunk) 13 by the factor \b\, with no effect 
from a. 

(d) Application to Coding 

In future chapters we shall draw upon this theory of linear transforma- 
tions in various contexts. However, it does have one immediate use; it can be 
applied to find a simpler computation of A 7 and s x than that shown in Table 
2-4. This involves three steps. 

1. Code all the X t values into a new set of Y t values. Our computations 
will be most simplified if we use the formula 

y _ Xj — one of the cell midpoints ^ jfy 

cell width 

In our example of students' heights, this becomes 



12 Called linear because, given any values of a and b, the graph of Y — a + bX is a straight 
line (with slope b and Y-intercept a). 

13 More precisely, stretched if \b\ > 1, but shrunk if |/;| < 1. 



If Y t = a + bX t 
then F = a + bX 
and s r ~ \b\ s x 



(2-23) 
(2-24) 
(2-25) 




(2-27) 



This is clearly a linear transformation of the form of (2-23), with 

a = = -23 
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Table 2-5 Coded Computation of Mean and Standard Deviation of a 
Sanple of 200 Men's Heights (Compare with Table 2-4) 



(1) 



57 
60 
63 
66 
69 
72 
75 
78 
81 
84 



Coding 



(2 



(3) 



Vi = 



x t - 69 



-4 



ft 



For Y 
(4) 



For s r , using easy 
calculation (2-1) 

(5) 



2 
7 
22 
13 
44 
36 
32 
13 
21 
10 



2M- 



-8 
-21 
-44 
-13 
0 

36 

64 

39 

84 

50 
187 

« 200 
.935 

3F + 69 



71.80 



+32 
+63 
+88 
+ 13 
0 
36 

128 

117 

336 

250 

2/1-2/? = 1063 
«F 2 - 175 

4-159(888) 
= 4.46 
= 3s F 
= 3^446 
— 6.35 



57 63 69 75 81 



""TrfTfnTTj 
-4 0 5 



FIG. 2-7 Coding from inches (X) into trintals ( Y), involving both a change of origin 

and of scale. 



k. r;;r 

I 
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Moreover, it is evident that when X i = 69, Y i = 0. Furthermore, as X i 
progresses by steps of 3, Y { progresses in unit steps. With these guidelines 
we can fill in the appropriate Y values in column 2 of Table 2-5 ; diagrammati- 
cal^, this coding is illustrated in Figure 2-7. 

2. Compute the mean and standard deviation of the rvalues. We note 
in the successive columns of Table 2-5 how easily this is now done. With 
Y and s Y now in hand, we are in a position to: 

3. Translate this mean and standard deviation back into lvalues. This 
involves applying the theory of linear transformations (Theorem III) to 
(2-27). 

y=^ +h ^L^2 (2 . 28) 



and 

From (2-28) 
From (2-29) 



s Y = y x (2-29) 

X = 3 F + 69 (2-30) 
= 71.80 

s x = 3s r (2-31) 
= 6.35 



Thus the simple coded computation of X and s x is complete. 



PROBLEMS 

(2-9) By coding the heights shown in Table 2-5 from inches (X) into feet 
(F), compute X and s x . Show your linear transformation with a 
diagram similar to Figure 2-7. Why is the coding used in the text 
preferred? 

2-10 Use coding to find the mean and standard deviation of the data in 

Problem 2-4, 
2-11 Find the mean of the following: 

239510 239250 239860 239360 
239480 239430 239230 239680 
239370 239290 239850 

(Hint. It is natural to simply drop the first 3 digits of every 
number, and just work with the numbers 510, 250, .... This is 
mathematically justified — it is just the linear transformation Y = X — 
239,000) 
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2-12 To show that nonlinear transformations are trickier, see if this is true. 
If y = X 2 , then Y = # 2 , when there are three values of X { — 1, 3, 5. 
(2-13) Using coding, find the mean and standard deviation of 

(a) The data of Problem 2-3. 

(b) The data of Problem 2-2. 

(2-14) Find the mean and standard deviation of the following sample of 50 
executive ages. Graph the relative frequency distribution. 

35 46 63 69 54 50 62 68 38 40 

55 43 42 59 45 44 57 47 48 46 

43 64 49 36 59 60 42 60 42 38 

51 50 66 63 57 56 51 38 61 54 

50 44 48 69 64 37 56 53 62 52 



Review Problems 



2-15 The 

below 



weekly wage rates for 5 major industrial groupings are listed 
Find the average weekly wage. 



industry 



A 


B 


C 


D 


E 


30% 


25 


20 


20 


5 


S120 


150 


120 


100 


80 



% or employment 
Weekly wage 



2-16 Suppose the number of children was recorded for each of 25 families, 
obtajning the following data: 

2,4, 1, 0, 1, 3, 0,4, 2, 6,0, 0, 2, 3, 1, 5,4, 
0,3,1,2,5,3,4,1. 



(a) 
(b) 

(2-17) The 
harv 
accoj-din 
as a 



Construct a frequency table and graph, 
^ind the sample mean and standard deviation, 
following table* gives the actual percent of farmland that was 
sted (as opposed to pasture, woodlot, etc.) in the U.S.A. in 1959, 
g to region. Compute the percentage harvested in the U.S.A. 
whole. 



* Source. Stat 



istical Abstract of the United States, 1963, pp. 625, 614. 
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Amount of Farmland 




Region 


(millions of acres) 


Percent Harvested 


i> orcn 


HZ I 


A(\ 7 
HO. I 


South 


357 


21.0 


Mountain 


264 


8.7 


Pacific 


80 


18.8 


U.S.A. 


1,122 


9 



2-18 A certain species of beetle was sampled, yielding the following 10 
lengths, in centimeters: 1.5, 1.0, 1.2, 1.0, 1.1, 1.0, 1.6, 1.2, 1.4, 2.0. 
Find the median, mean, range, variance, and standard deviation 

(a) For the original lengths. 

(b) If the lengths are expressed in mm (1 cm = 10 mm). 

(c) If the lengths are expressed as "centimeters above a standard beetle 
height of 1.1," (i.e., the sample values become + .4, —.1, .1, — .1, 

...). 

2-19 Throw a die 100 times (or else simulate this by consulting the random 
numbers in Appendix Table Ha). Graph the relative frequency dis- 
tribution, and calculate the sample mean 

(a) After 10 throws; 

(b) After 25 throws; 

(c) After 100 throws; 

(d) After millions of throws (guess). 



chapter 3 



Probability 



3-1 INTRODUCTION 

In the -next four chapters we make deductions about a sample from a 
known population; this is a necessary prelude to the induction involved in 
Chapters 7 1:0 10, where we shall make inferences about an unknown popula- 
tion from ai observed sample. 

If the population of American voters is 55 % Democrat, we cannot be 
certain that pxactly the same percentage of Democrats will occur in a random 
sample. Nevertheless, it is "likely" that "close to" this percentage will turn 
up in our sample. Our objective is to define "likely" and "close to" more 
precisely; in this way we shall be able to make useful predictions. First, 
however, we must lay a good deal of ground work. Predictions in the face 
of uncertain 1 y or chance require a knowledge of the laws of probability, and 
this chapter is devoted exclusively to their development. We start with the 
simplest examples — tossing coins and rolling dice. 

Consider again our example in Chapter 1 , in which the reader gambled 
against rolling an ace on a die. This gamble was based on the judgement that 
this outcome was unlikely. Now let's be more specific, and try to define its 
probability precisely. Intuitively, since this is but one of six equally probable 
outcomes, wfe might (correctly) guess its probability to be one in six, or 
one-sixth — provided it is an honest die. Alternatively we might say that if 
the die were thrown a large number of times, the relative frequency (of rolling 
an ace) would approach one-sixth (as in Problem 2-19). This is a useful 
operational approach ; thus, if we suspect that this die is not, in fact, a fair 
one, we coul<ji test by tossing it many times, and observing whether or not 
the relative frequency of this outcome approached one sixth. 

This definition of probability as "the limit of relative frequency," is 
formally stated as: 

27 
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Definition 



A "1 

Pr = Iim -- 

rc-voo w 



(3-1) 



where ^ is the outcome ("ace") 

/z is the total number of times that the trial is repeated (die is thrown) 
/z x is the number of times that the outcome e 1 occurs, [also called n(e ± ) or 
the frequency / of e x ] 
n \ 

~ is therefore the relative frequency of e r 



We shall use this definition of probability because it provides the clearest 
intuitive idea. However, you will find in Section 3-6 that it involves conceptual 
difficulties; thus, if you choose to study probability further you will soon be 
forced to turn to the axiomatic approach. 



PROBLEMS 



3-1 (a) Throw a thumbtack 50 times. Define tossing "the point up" as e x . 
Record your results as in the following table, and keep a permanent 
record for future reference. 



Trial Number 




Frequency of "Ups" 


Relative Frequency 


(«) 


Point Up? 


(rtx) Accumulated 


(njn) 


1 


No 


0 


.00 


2 


Yes 


1 


.50 


3 


Yes 


2 


.67 


4 


No 


2 


.50 


5 


Yes 


3 


.60 


10 








20 








30 








40 








50 









(b) Show 
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your results on the following graph : 
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1.0 



£ 0.5 



& 0 



10 



15 
n 



20 



50 



(c) Whatjis your best guess of the probability of tossing the point up? 

Of tossing the point down? 
In tossing! a coin, define a "head" 



3-2 In tossing'a coin, define a "head" as e x ; and proceed as in 3-l(a) and 
(b), tossiijg it 100 times. (Record your results for use in Chapter 9.) 

3-3 Roll a die 100 times. Define rolling a four as e 4 , and proceed as in 
3- 1(a) and (b). (Record your results for future use. You may use the 
same data as in Problem 2-19.) 

3-4 Roll a pai - of dice, and define the event E to occur if you get a total of 
7 or 11. Repeat 50 times, as in 3-1 (a) and (b). What is your estimate 
of Pr(£)7 Can you derive Pr (2s) theoretically, in order to be exact, 
and also sive the empirical work? 



3-2 ELEMENTARY PROPERTIES OF PROBABILITY 

We generalize by considering an experiment with JV elementary outcomes 
(e u e 2 , . . . , e i . . . , e iV ). The relative frequency njn of any outcome e i 
must be positive, since both the numerator and denominator are positive; 
moreover, since the numerator cannot exceed the denominator, relative 
frequency cannot exceed 1. Thus 

0<^<1 
n 

The same relations are true in the limit, so that from (3-1) 



and 



0 ^ Pr («,) 



Pr (O < 1 



(3-2) 
(3-3) 
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Next we note that the frequencies of all possible outcomes sum to n. 

«i + "2 + ' * " «.v ™ n 
Dividing this equation by n, we find that all the relative frequencies sum to L 



- + ~ + 

n n 



+ — = 1 

n 



This same relation is true in the limit, so that 



Pr (ej + Pr (e 2 ) + • • ■ Pr (e v ) = 1 



(3-4) 



3-3 EVENTS AND THEIR PROBABILITIES 



(a) The Outcome Set — An Example 



(H, H, H) * ei 
(H r H, T) = e 2 
(H P T, H) = e 3 
(H, T, T) = e 4 
(T, H, H) = e 5 
(T, H, T) = e 6 
(T, T, H) = e 7 
(T.T.T) =e 8 



FIG. 3-1 Outcome set 
in flipping a coin three 
times. 



In the previous section, the die example was an experiment where the 
outcomes e l9 e 2 , . . . , e$ were numerical, and involved no complications. 
Usually, an experiment will have a more complex set of outcomes. 

For example, suppose the experiment consists of 
flipping a coin three times (or, equivalently, flipping 
three coins at once). A typical outcome (designated 
as e 4 ) is the sequence H, T, T. The list of all possible 
outcomes, or outcome set, is shown in Figure 3-1. 
Since most experiments of interest to the practical 
statistician are sampling experiments, the outcome 
set is also often known as the sample space S. 

We note several features. The order in which the 
set of eight outcomes {e l9 e 2 , . . . , e H } is listed doesn't 
matter. Whenever this is the case, it is a mathematical 
convention to use curly brackets. Thus the two out- 
come sets {e l9 e 2 , . . . , e s } and {<? 2 , e l9 e 8 , . . . , e 5 } are 
the same set. 

However, since (H, H, T) and (H, T, H) are separate and distinct out- 
comes, the order in which H and T appear is an essential feature; in this case 
we use round brackets and call the result an ordered triple. 

Finally, we note that an experimental outcome involves an entire 
ordered triple. It is tempting to try to tear each triple into three parts, and 
think of 24 outcomes. This mistake is avoided by writing down a dot for each 
of the 8 elementary outcomes. (Hereafter, we shall often refer to outcomes 
as "points" for short.) 
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To simplify calculations without restricting our concepts in any way, 
let us suppose that the coin is fairly tossed, so that all 8 outcomes are equally 
probable. Since all 8 probabilities must sum to 1 according to (3-4) we have 



(b) Events 



Contimlin 



This event i 



ig the example of 3 coins, consider the event 
E: at least 2 heads 
includes outcomes e l9 e 29 <? 3 , and e h in Figure 3-1. We might say 



Definition 



where n K 
e u o 



P(eJ = P(e z ) = • • • = P(e s ) = i 



(3-5) 



Event E 




Outcome set, or 
sample space, S 



FIG. 3-2 An event as a subset of points within an outcome set. 

the event E is the collection of points e 2 , e 3 , e b } as in Figure 3-2, In fact, 
this is a convenient way to generally define an event: 



An event E is a subset of the outcome set S 



(3-6) 



We noW ask "What is the probability of ET Using the definition of 
limiting relative frequency, we may write 



Pr (£) =lim^ 

« -+ -X ft 



(3-7) 



frequency of £. But of course E occurs whenever the outcomes 
• e 5 occur. Thus 

n E = n x + n 2 + rc 3 + « 5 
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and from (3-7) 

Pr(£) = lim "* + "» + "» + "« 
n 

= lim (2i + «s + «s + «s\ 

\/7 n n nj 
= Pr (e t ) + Pr (e 2 ) + Pr (e s ) + Pr (e 5 ) (3-8) 
= i + i + i + £ = i 

Table 3-1 Several Events in the Experiment of Figure 3-1 (Tossing 3 Coins) 



Three alternative ways of naming an event 



(1) 


(2) 


(3) 


(4) 


Arbitrary 








Symbol 








for Event 


Description 


Outcome List 


Probability 


E 


At least 2 heads 


<?2> <?3> ^5} 


1/8 + 1/8 + 1/8 + 1/8 - 1/2 


F 


Second coin head, 








followed by tail 




1/4 


G 


Fewer than 2 heads 


{e 4 , e 6 , e 7 , e H } 


1/2 


H 


All coins the same 




1/4 


I 


No heads 




1/8 


h 


Exactly 1 head 




3/8 


h ■ 


Exactly 2 heads 


3/8 


h 


Exactly 3 heads 




1/8 


J 


Less than 2 tails 


?2> ta> e *} 


4/8 



The obvious generalization of (3-8) is that the probability of an event 
is the sum of the probabilities of all the points (or outcomes) included in 
that event, that is 

Pr (E) = J Pr (*,) (3-9) 



summing over just those outcomes e t which are in E. We note an analogy 
between mass (in physics) and probability: the mass of an object is the sum 
of the masses of all the atoms in that object; the probability of an event is 
the sum of the probabilities of all the outcomes included in that event. 

Various events are considered in Table 3-1; ail the outcomes included 
in each event are listed in column 3. Since the probability of each outcome 
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is 1/8, the calculation of the probability of each event in column 4 is very 



simple. The 
events in tl 



value of the list is evident when we consider the first and last 
is table. In fact, they are the same event; although this may not 



have been clear immediately from the description, the list makes it obvious. 



(c) Combining Events 

r 

i 

As an example, we might ask for the probability of "G or H" that is, 
that there \yill be less than 2 heads or all coins the same (or both). This 






FIG. 3-3 Venn diagrams, illustrating probability of combined events. (The rectangle in 
each case represents the whole sample space; hence the probability of all points (or out- 
comes) within I rectangle sum to 1.) (a) G u H shaded, "<7 or if"; (b) G n H shaded, 
' k G and ff"; (c) / u J shaded. 

combined evefnt is denoted by "G U H" and may be read "G union "H" as 
well as "G or /T." From the lists of Table 3-1 it can be seen that 

G u H = {e 4 , <? 6 , e 7 , e 8 , ej. 

In general, for any two events G, //: 



Definition. 



G y 7/ « set of points which are in G, or in //, or in both. 



(3-10) 



A little a ?stract art in Figure 3-3<z, called a Venn diagram, illustrates 
this definition Since five of the eight equiprobable outcomes are included 
in G U H, its probability is 5/8. 
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Similarly, we might be interested in the event "G and H" that is, that 
there will be fewer than 2 heads, and all coins the same. This is clearly a much 
more restricted combined event; any outcome must satisfy both G and H, 
rather than either G or H. Again, we can use a Venn diagram as in Figure 
3-36; this shows clearly that there is only one outcome (3 tails ) that qual fies. 
This combined event is denoted by G n H, and may be read 1 G intersect H 
as well as "G and H." The lists of G and H in Table 3-1 confirm that 

G r\H= K) 

since the only outcome appearing in both lists is e s . Hence the probability 
of G O H is 1/8. In general, for any 2 events G, H 



Definition. 



G n H = set of points which are in both G and H. 



(3-H) 



(d) Probabilities of Combined Events 

We have already shown how Pr (G U //) may be found from the Venn 
diagram in Figure 3-3. Now we should like to develop a formula. First con- 
sider a pair of events that do not have any points in common, such as / 
and J from Table 3-1. (We also say that they are mutually exclusive, or do 
not overlap). From Figure 3-3c it is obvious that 

Pr (/ U /) = Pr (/) + Pr (J) ( 3 " 12 ) 

5 111 

8 — 8 + 8 

But this simple addition does not always work. For example 

Pr (G U H) * Pr (G) + Pr (H) (3-13) 

t * I + i 

What has gone wrong in this case? Since G and H overlap, in summing 
Pr (G) and Pr (H) we count the intersection G O H twice; this is why (3-1 i) 
overestimates. This is easily corrected; subtracting Pr(G O H) eliminates 

1 To remember when u or n is used, it may help to recall that u stands for "anion " 
and that n resemb.es the letter "A" in the word "and." These technical symbols are used 
to avoid the ambiguity that might occur if we used ordinary ^ «*™^' ^ 

sentence"* u F has 5 points" has a precise meaning, but the informal E or Fhas 5 points 
is ambiguous. 
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this double counting. Thus, we have shown 
Theorem. 



In our exam 



Pr (G u H) - Pr (G) + Pr (H) - Pr (G n #) 



(3-14) 



pie 



8 — 8 I 8 



Formula (3-14) is in fact quite general, and applies to any two events. It 
works in this example where G and H overlap. It also applies in cases like 
(3-12) where / and J do not overlap; hence Pr(7 Pi J) = 0, and this last term 
disappears vjhen (3-14) is applied. For emphasis we may write in general, 



Theorem, 



Pr (/ u J) = Pr (/) + Pr (J) 
if / and J are mutually exclusive. 



(3-15) 



But it must be recognized that this is just a special case of (3-14). 

A collection of several events is defined as mutually exclusive if there is 
no overlap, i.e., if no outcome e i belongs to more than one event. For ex- 
ample, in Tajble 3-1, events 7, 7 l9 and I 2 are mutually exclusive; but E f F, 
and I are not, because E and F overlap at e 2 . 

The collpction of events {/, I l9 7 2 , 7 3 } is mutually exclusive, and also 
"covers" the j whole sample space 5. We therefore call it a partition of S. 
In general, 

Definition. 



Thus a partit 
events, as illu 



A partition of a sample space S is a collection 
of mutually exclusive events {/,... /„} whose 
union is the whole sample space S. 

I u I x u J 2 • • • u l n = S 



(3-16) 



on completely divides the sample space into nonoverlapping 
strated in Figure 3-4Z?. 
In Table 3-1 note that G consists of exactly those points which are not 
in E. We therefore could call G the "complement of £," or ''not E" and 
denote it by £ . And in general, for any event E 

Definition. 



E = points in sample space not in E. 



(3-17) 



36 



PROBABILITY 














(e 4 








E 6 




FIG. 3-4 Venn diagrams to illustrate definitions, (a) E x and E 2 are mutually exclusive; 
(b) E v £ 2 , . . . , £ fi form a partition; (c) E shaded, (Note. {E, £} form a partition). Sample 
space S in each case is represented by rectangle. ""xT**" X* 

''VV .»*■"•"** 



An event and its complement {is, £} form a very simple partition. 
Because these events are mutually exclusive, by (3-15) 



Pr (EU E) = Pr (E) + Pr (£) 
and since {£, £} form a partition 

Pr (EuE) = 1 

Substituting (3-19) into (3-18) 

1 = Pr (E) + Pr (£) 
This yields a solution for Pr (E) in terms of Pr (E) : 



(3-18) 



(3-19) 



Theorem. 



Pr (£) =1 - Pr (E) 



(3-20) 



As an example, consider the probability of getting at least one head. The 
complement is "no heads," and is very simple to calculate. Thus 

Pr (at least one head) = 1 — Pr (no heads) 
= 1 

This is not the only way to answer this question, but it is by far the simplest, 
since Pr (no heads) is so easy to evaluate. The student should be on the alert 
for similar problems : the key words to watch for are "at least," "more than," 
"less than," "no more than," etc. 



PROBLEMS 

3-5 Suppose 
(a) The 
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a penny and nickel are thrown on the table, 
outcome set may be listed conveniently: 
(penny, nickel) 



■ (H, H) 


= «i 


(H,T) 


= e% 


■ (T, H) 


= e-i 


■ (T, T) 


= e.i 



Satisfy yourself that all 4 outcomes are equally likely, in 2 ways- 
1. Philosophical argument. Obviously e x and e . are equally likely 
because -they differ only in what happens to the symmetric nickel 
S.mdarly e 3 and e, are equally likely. Finally, e, and e 3 are equally 

nenlv ft U ^ ^ WM ha PP enS t0 the ^^tric 

penny. Tjhus all 4 outcomes are equally likely. 

2^ Empikcal argument. Have everyone in the class repeat the experiment 
10 times so that a large amount of data can be pooled. Is the relative 
trequencV of each outcome about 1/4? 

(b) Consider the following alternate outcome set [which is recognized 
as just a Reduction of the outcome set in (a)]. 



• Both iieads 

• Both tails 

• One of each 



A re these three outcomes equally likely? What are their probabilities? 
(c) What is the probability of at least one head? Answer using the 
two alternate outcome sets, and verify that you get the same anLr. 
3-6 The outcome set of Problem 3-5(a) could alternatively be written as 



Nickel 






Penny 


H 


T 


H 




• (H, H) 


• (H, T) 


T 




■ (T, H) 


• (T, T) 
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In the same way, list the outcome set when a pair of dice are 
thrown— one red, one white. Then calculate the probability of 

(1) A total of 4 dots. 

(2) A total of 7 dots. 

(3) A total of 7 or 11 dots (as in Problem 3-4). 

(4) A double. 

(5) A total of at least 8 dots. 

(6) A double 3. 

(7) A 1 on one die, 5 on the other. 

(8) Would you get the same answers to (l)-(7) if the dice were both 
painted white? In particular, compare the chance of a {3, 3} combina- 
tion to the chance of a {1, 5} combination. 

3-7 Suppose the coin of Figure 3-1 were not fairly thrown, and that over 
the long run, the following relative frequencies were observed 

e Pr (e) 



(H H H) 


.15 v 


(H HT) 


.10 


(HTH) 


.10 


(H T T)" 


.15 


(T H H) 


.15 


(THT) 


.10 


(TTH) 


.10 


(TTT) 


.15 ✓ 



Recalling the definitions of Table 3-1 , 

G: fewer than 2 heads 
H: all coins the same 

find the following probabilities. (Hint. Use (3-9) and a Venn diagram.) 

(a) Pr (C); Pr (//); Pr (G U H); Pr (G n //) 

(b) Verify that (3-14) holds true. 
Let us further define 

K: fewer than 2 tails 
L: some coins different 

Then find 

(c) Pr (AO ; Pr (L); Pr (K U L)\ Pr (K n L) 

(d) Verify that (3-14) holds true. 
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3-8 (a) List ;he sample space of 4 coins tossed simultaneously. 

(b) Define events A : all coins the same 
B: precisely 1 head 
C: at least 2 heads 

Evaluate Pr (A) + Pr (B) + Pr (C). Do these events 
form a partition? 

(c) Redefine A as "all tails." Do A y B, C now form a partition? What 
is Pr (A) + Pr (B) + Pr (C)? 

3-9 When a cjoin is fairly tossed 4 times, let Y denote the number of changes 
in sequejice. For example, the outcome H T H H may be written 
H/T/HH^ where the two changes in sequence are indicated by slashes; 
similarly j the outcome H/TTT has only 1 change. What is 

(a) Pr(t= I) 

(b) Pr(t=2) 

(c) Do the events of (a) and (b) form a partition? 

3-10 (a) What is the probability of at least one head when 4 coins are 
tossed? 

(b) Wha| is the probability of at least one head when 10 coins are 
tossed? j 

3-1 1 1 Suppbse a class of 100 students consists of several groups, in the 
following proportions : 



the 



woman , 



If a 
chance 

(a) A ma,n ? 

(b) A 
(c) 

(d) A 

(e) A 

(f) If the 
chance 



student is chosen by lot to be class president, what is the 
student will be: 



Taking 
man 
man 



that 



Taking math 
Not 

taking math 



Men 


Women 


21 


38 


100 


100 


23 


22 , 


Too 


loo' 



math ? 

, or taking math ? 
, and taking math? 

class president in fact turned out to be a man, what is the 
he is taking math ? Not taking math ? 



1 Problems preceded by arrows are important, because they introduce a later section in 
the text. 
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=> 3-12 The students of a certain school engage in various sports in the 
following proportions: 

Football, 30% of all students. 
Basketball, 20%. 
Baseball, 20%. 

Both football and basketball, 5%. 
Both football and baseball, 10%. 
Both basketball and baseball, 5%. 
All three sports, 2%. 
If a student is chosen by lot for an interview, what is the chance 
that he will be : 

(a) An athlete (playing at least one sport) ? 

(b) A football player only ? 

(c) A football player or a baseball player? 

If an athlete is chosen by lot, what is the chance that he will be: 

(d) A football player only ? 

(e) A football player or a baseball player? 
Hint. Use a Venn diagram. 

(f) Use your result in (a) to generalize (3-14). 



3-4 CONDITIONAL PROBABILITY 

Continuing with the experiment of fairly tossing 3 coins, suppose that 
the tossing is completed, and we are informed that there were fewer than. 2 
heads, i.e., that event G had occurred. Given this condition, what is the 
probability that event / (no heads) occurred? This is an example of "con- 
ditional probability," and is denoted as Pr (7/G), or "the probability of /, 
given G." 

The problem may be solved by keeping in mind that our relevant 
outcome set is reduced to G. From Figure 3-5 it is evident that Pr (IjG) = 1/4, 

The second illustration in this figure shows the conditional probability 
of H (all coins the same), given G (less than 2 heads). Our knowledge of G 
means that the only relevant part of H is H n G ("no heads" = /) and thus 
Pr (H/G) ~ 1/4. This example is immediately recognized as equivalent to 
the preceding one; we are just asking the same question in two different ways. 

Suppose Pr (G), Pr (H) 9 and Pr (G n H) have already been computed for 
the original sample space S. It may be convenient to have a formula for 
Pr (H/G) in terms of them. We therefore turn to the definition (3-1) of 
probability as relative frequency. We imagine repeating the experiment n 
times, with G occurring n(G) times, of which H also occurs n(H Pi G) times. 



CONDITIONAL PROBABILITY 



Knowledge thatjG 
has occurred makes 
this original sample 
space S irrelevant. 



G, which now 
becomes the new 
sample space. f 



/; this event inc udes 
one of four equi- 
probable outcome's 
in sample space b. 
Thus Pr (I/G) i % 



The ratio is 




G, which becomes 
new sample space. 
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Knowledge that G 
has occurred makes 
_the original sample 
space S (including 
outcome e\ in H) 
irrelevant. 



HOG, the only 
relevant part of H. 



FIG. 3-5 V T dia ^ m ^ probability. (., Pr(/,G). (b) p r( „ /G , 

Mote Vr{HjG) is identical to Pr(//G). 



the conditional relative frequency, and in the limit 

Pr(/f/G)4 lim <«J2G) 
»-» n(G) 

On dividing numerator and denominator by n, we obtain 

Pr (H/G) = lim - (H ° Q/g 
»-«> w(G)/n 



(3-21) 



PROBLEMS 



Pr (HIG) = P lSE^ G ^ 
Pr(G) 



(3-22) 



luHi^g'b, PrW° U ^ ' S,iEh " y d,,fercnt fMm ' " b ' aiMd »> «- 



Pr (/f n G) = Pr (G) Pr (#/G) 



(3-23) 



(3-13) Flip 3 
foil 



owin 



2 In this section 
probabilities 



This 



coins over and over again, recording your results as in the 
ng table. 

ind the next, we shall assume all events under consideration have nonzero 
permits us to divide legitimately by various probabilities at will. 
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Conditional 



Trial 




Accumulated 


If G Occurs, 


Accumulated 


Relative 


Number 


G 


Frequency 


Then// Also 


Frequency 


Frequency 


n 


Occurs ? 


n{G) 


Occurs? 


n(H n G) 


n(HnG)ln(G) 


1 


No 


0 








2 


Yes 


1 


Yes 


1 


1.00 


3 


No 


1 








4 


Yes 


2 


No 


1 


.50 


5 


Yes 


3 


Yes 


2 


.67 



After 50 trials, is the relative frequency n(H Pi G)jn(G) close to 
the probability calculated theoretically in the previous section? (If 
not, it is because of insufficient trials, so pool the data from the whole 
class.) 

3-14 Using the unfair coins and definitions of Problem 3-7, calculate 

(a) Pr (G/H) 

(b) Pr (HjG) 

(c) Pr (tf/L) 

(d) Pr (RjL) 

3-15 (a) A consumer may buy brand X or brand Y but not both. The 
probability of buying brand X is ,06, and brand Y is .15, Given that 
the consumer bought either X or Y, what is the probability that he 
bought brand X? 

(b) If events A and B are mutually exclusive (and of course non- 
empty, i.e., include at least one possible outcome), is it always true 
that 

Pr (A/ A UB)= [Pr 04)]/ [Pr (A) + Pr (J?)]? 

3-16 A bowl contains 3 red chips (numbered R l9 R 2 , R$) and 2 white chips 
(numbered W l9 JV 2 ). A sample of 2 chips is drawn, one after the other. 
List the sample space. For each of the following events, diagram the 
subset of outcomes included and find its probability. 

(a) Second chip is red. 

(b) First chip is red. 

(c) Second chip is red, given the first chip is red. 

(d) First chip is red, given the second chip is red. 

(e) Both chips are red. 

Then note the following features, which are perhaps intuitively 
obvious also : 
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(1) The answers to (a) and (b) agree, as do the answers to (c) and (d). 

(2) Show that the answer to (e) can be found alternatively by applying 
(3-23) to parts (b) and (c). 

(3) Extension of part (2) : if 3 chips are drawn what is the probability 
that all 3 are red? Can you now generalize Theorem (3-23)? 

(3-17) Two cards are drawn from an ordinary deck. What is the probability 
that* 

(a) Tjhey are both aces? 

(b) They are the two black aces? 

(c) They are both honor cards (ace, king, queen, jack or ten)? 
3-18 A poker hand (5 cards) is drawn from an ordinary deck of cards. 

Whatjis the chance of drawing, in order, 

(a) 2;aces, then 3 kings? 

(b) 2jaces, then 2 kings, finally a queen ? 

(c) 4 .aces, then a king? 

What } is the chance of drawing, in any order whatsoever, 

(d) 4 :aces and a king? 

(e) 4 aces? 

(f) 'Tour of a kind" (i.e., 4 aces, or 4 kings, or 4 jacks, etc.)? 

If the j5 cards are drawn with replacement (i.e., each card is replaced 
in the jdeck before drawing the next card, so that it is no longer a real 
poker .deal), what is the probability of drawing, in any order, 

(g) Exactly 4 aces ? 

3-19 A supply of 10 light bulbs contains 2 defective bulbs. If the bulbs 
are picked up in random order, what is the chance that 

(a) Tlje first two bulbs are good? 

(b) Thje first defective bulb was picked 6th? 

(c) The first defective bulb was not picked until the 9th? 
3-20 Two dice are thrown. Let 

E: firs die is 5 
F: totd is 7 
G: total is 10 

Compute the relevant probabilities using Venn diagrams. Show that: 

(a) Pr '(/?£) = Pr(F). 

(b) Pr|(G/£)^Pr (G). 

(c) Is it true that Pr (EjF) = Pr (£)? Do you think this is closely 
related to (a), or just an accident? 

3-21 If E and F are any 2 mutually exclusive events (and both are non- 
empty, of course), what can be said about Pr (EjF)l 

3-22 A company employs 100 persons — 75 men and 25 women. The 
accounting department provides jobs for 12% of the men and 20% 



1 
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of the women. If a name is chosen at random from the accounting 
department, what is the probability that it is a man? That it is a 
woman? 

3-23 (Bayes' Theorem). In a population of workers, suppose 40% are 
grade school graduates, 50% are high school graduates, and 10% are 
college graduates. Among the grade school graduates, 10% are un- 
employed, among the high school graduates, 5% are unemployed, 
and among the college graduates 2% are unemployed. 

If a worker is chosen at random and found to be unemployed, 
what is the probability that he is 

(a) A grade school graduate? 

(b) A high school graduate? 

(c) A college graduate ? 

(This problem is important as an introduction to Chapter 15 ; therefore 
its answer is given in full.) 

Answer. Think of probability as proportion of the population, if you 
like. 

Classes of Workers 



Old sample 



(llllML 



space = pop 
of workers 



///////////////////// 



ulation 



Pr (Q) 


.40 \ 


.50 \ 


) 


,o\ 


Pr.(£/C 2 :) 




.05 


.02 


Pr (E n d) = 
Pr (£/Q)Pr(Q) 


.040 / 


.025 


.002 7 



Effect E (unemployment) is 
the new sample space, 
shaded. 
Pr (£) = .067. 




Pr (E) = 2 Pr (E ^ 



In the new sample space shaded, (3-22) gives 

040 

(a) Pr(Q/£) = — = .597 

.067 

(b) Pr(C 2 /£) = — = .373 

.067 

(c) Pr (C 8 /£) = — = .030 

.067 



check, sum = 1 



1.000 
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Notes on Bayes' Theorem. Problem 3-23 is an example of Bayes' 
Theorem, which may be stated as follows: 

Certain "causes" (education levels) C l9 C 2 , . 
probabilities Pr (Q). In a sense the causes produce an "effect" E 
(unemployment) not with certainty, but with conditional probabilities 



C w , have prior 



Pr (E/C 



). Using conditional probability manipulations, one calculates 



'shine' 
'rain' 



eventually the probability of a cause given the effect, Pr (CjE): 

Given Deduced 
Pr (Qj 
Pr (£/Oj -> Fr(CJE) 

3-24 In a certain country, it rains 40% of the days and shines 60% of the 
days. A barometer manufacturer, in testing his instrument in the lab, 
has found that it sometimes errs: on rainy days it erroneously predicts 
e" ? 10% of the time, and on shiny days it erroneously predicts 
. jjo% of the time. 

(a) In predicting tomorrow's weather before looking at the barometer, 
the (prirjr) chance of rain is 40%. After looking at the barometer and 
seeing it predict "rain," what is the (posterior) chance of rain? 

(b) Whit is the posterior chance of rain if an improved barometer 
(error ntes of 10 and 20% respectively) predicts "rain"? 

(c) Wha^t is the posterior chance of shine if the improved barometer 
predicts "rain" 0 



3-5 INDEPENDENCE 

. • • i ; 

In Problen 3-20 we noticed that Pr (Fj'E) = Pr (F). This means that the 
chance of F, knowing E, is exactly the same as the chance of F, without 
knowing E\ or, knowledge of E does not change the probability of F at all. 
It seems reasonable, therefore, to call F statistically independent of E. In 
fact, this is the basis for the general definition: 

Definition. 



Of course, 
say that G was 



An event F is called statistically independent 



of an event E if Pr (FjE) = Pr (F) 



(3-24) 



changes the probability of G 



n the case of events G and E, where F(G/E) ^ P(G), we would 
statistically dependent on E. In this case, knowledge of E 
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We now develop the consequences of F being independent of E. Sub- 
stituting (3-22) in (3-24), we obtain 

Pr(£) 

hence 



Pr (F n E) = Pr (F) Pr (F) (3-25) 



We can reverse this argument, and work backwards from (3-25) as 
follows: 

Pr (F n E) 



= Pr (£) 
Pr(F) 



Pr (F/F) = Pr (E) 



(3-26) 



That is, E is independent of F whenever F is independent of E. In other 
words, the result in Problem 3-20(c) above was no accident. In view of this 
symmetry, we may henceforth simply state that E and F are statistically 
independent of each other, whenever any of the three logically equivalent 
statements (3-24), (3-25), or (3-26) is true. Usually, statement (3-25) is 
the preferred form, in view of its symmetry. Sometimes, in fact, this "multi- 
plication formula" is taken as the definition of statistical independence. 
But this is just a matter of taste. 

Notice that so far we have insisted on the phrase ''statistical inde- 
pendence," in order to distinguish it from other forms of independence — 
philosophical, logical, or whatever. For example, we might be tempted to 
say that in our dice problem, F was "somehow" dependent on E because the 
total of the two tosses depends on the first die. This vague notion of depend- 
ence is of no use to the statistician, and will be considered no further. But 
let it serve as a warning that statistical independence is a very precise concept, 
defined by (3-24), (3-25), or (3-26) above. 

Now that we clearly understand statistical independence, and agree that 
this is the only kind of independence we shall consider, we shall run no risk 
of confusion if we are lazy and drop the word "statistical." 

Our results so far are summarized as follows: 



General Theorem 


Pr (E u F) 


Pr (F n F) 


= Pr (E) + Pr (F) - Pr (E n F) 


= Pr (F) • Pr (F/F) 


Special Case 


= Pr (F) + Pr (F) 


= Pr (F) • Pr (F) 




if E and F mutually 


if E and F are 




exclusive; i.e., 


independent; i.e., 




if Pr (F n F) = 0 


if Pr (F/F) = Pr (F) 
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PROBLEMS 

3-25 Three cofns are fairly tossed. 

E 1 : first two coins are heads; 
E 2 \ last coin is a head; 
E 3 : all three coins are heads. 

Try to answer the following questions intuitively (does knowledge of 
the condition affect your betting odds?). Then verify by drawing the 
sample space and calculating the relevant probabilities for (3-24). 

(a) Are k x and E z independent? 

(b) Are k and £ 3 independent? 

3-26 Repeat Problem 3-25 Using the three unfair coins whose sample space 
is as follows (compare Problem 3-7). 



3-27 A certain 
on or off 



e 


Pr (e) 


•(HHH) 


.15 


• (H H T) 


.10 


■ (H T H) 


.10 


- (H T T) 


.15 


• (T H H) 


.15 


• (T H T) 


.10 


■(TTH) 


.10 


• (TTT) 


.15 



electronic mechanism has 2 bulbs which have been observed 
with the following long-run relative frequencies: 



This 
off 30 

(a) Is "b 

(b) Is "b 
3-28 A single 



\^Bulb 2 
Bulb 1 


On 


Off 


On 


.15 


.45 


Off 


.10 


.30 



percent 



;able means, for example, that both bulbs were simultaneously 

of the time, 
lb 1 on" independent of "bulb 2 on"? / 
lb 1 off" independent of "bulb 2 on"? 
cjard is drawn from a deck of cards, and let 

E: it is an ace 
F: it is a heart. 
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Are E and F independent, when we use 

(a) An ordinary 52-card deck. 

(b) An ordinary deck, with all the spades deleted. 

(c) An ordinary deck, with all the spades from 2 to 9 deleted. 



3-6 OTHER VIEWS OF PROBABILITY 

In Section 3-1 we defined probability as the limit of relative frequency. 
There are several other approaches, including symmetric probability, 
axiometric probability, and subjective probability. 



(a) Symmetric Probability 



The physical symmetry of a fair die assures us that all six of its outcomes 
are equally probable. Thus 

Pr (e ± ) = Pr (e 2 ) = Pr (e G ) 

In order that these six probabilities sum to one, each must be 1/6, 
(compare to (3-5)). 

In general, for an experiment having N equally likely outcomes or 
points, for each point e } 

Pr (<?,) = - 

Then, for an event E consisting of N E points, the probability is given 
by (3-9) as 



Pr(£)=2Pr(t^ = AV (~ 



where the summation ]T extends only over points e i in E (N E in number). 
Thus, for equally probable outcomes 



Pr (E) = 

N 



(3-27) 



For example, in rolling a fair die consider the event 

E: number of dots is an even number. 

E consists of three of the six equiprobable elementary outcomes (2, 4, or 6 
dots); thus its probability is 3/6. 

Symmetric probability theory begins with (3-27) as the definition of 
probability, and gives a simpler development than our earlier relative 



OTHER VIEWS OF PROBABILITY 
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frequency approach. However, our earlier analysis was more general; 
although the ejxamples we cited often involved equiprobable outcomes, the 
theory we developed was in no way limited to such cases. In reviewing it, 
you should confirm that it may be applied whether or not outcomes are 
equiprobable; special attention should be given to those cases (e.g., Problem 
3-26) where outcomes were not equiprobable. 

Not only is symmetric probability limited because it lacks generality ; it 
also has a major philosophical weakness. Note how the definition of proba- 
bility in (3-27) involves the phrase "equally probable"; we are guilty of 
circular reasoning. 

Our ownj relative frequency approach to probability suffers from the 
same philosophical weakness. We might ask what sort of limit is meant in 
equation (3-1)'? It is logically possible that the relative frequency njn behaves 
badly, even irl the limit; for example, no matter how often we toss a die, it 
is just conceivable that the ace will keep turning up every time, making 
lim njn = 1 . ^Therefore, we should qualify equation (3-1) by stating that the 
limit occurs with high probability, not logical certainty. In using the concept 
of probability in the definition of probability, we are again guilty of circular 
reasoning. 



(b) Axiomatic Objective Probability 

The only philosophically sound approach, in fact, is an abstract axio- 
matic approach. In a simplified version, the following properties are taken 
as axioms: 



Axioms. 



Pr (ed > 0 

Pr (ej + Pr (e 2 ) • • • + Pr (* v ) 
Pr (E) -£Pr 



(3-2) repeated 
(3-4) repeated 
(3-9) repeated 



Then the other properties, such as (3-1), (3-3), and (3-20) are theorems 
derived from these axioms — with axioms and theorems together comprising 
a system of analysis that appropriately describes probability situations such 
as die tossing, etc. 

Equation (3-1) is particularly important, and is known as the law of 
large numbers. Equations (3-3) and (3-20) may be proved very easily, so 
easily in fact that we shall give the proof to illustrate how nicely this axio< 
matic theory can be developed. We can prove even stronger results: for any 
event E, 
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Theorems. 



0 < Pr (E) 

Pr (E) < I 

Pr (E) = 1 — Pr (E) 



(3-28), like (3-2) 

(3-29), like (3-3) 

(3-30), repeating 
(3-20) 

Proof. According to axioms (3-9) and (3-2), Pr (E) is the sum of positive 
terms, and is therefore positive; thus (3-28) is proved. 
To prove (3-30), we write out axiom (3-4): 



Pr ( e± ) + Pr (e 2 ) + 



Terms for E 



Terms for E 



(3-31) 



According to (3-9), this is just 

Pr (£) + Pr (E) = 1 

from which (3-30) follows. 

In (3-28) we proved that every probability is positive or zero. In particu- 
lar Pr {£) is positive or zero; substituting this into (3-31) ensures that: 



Pr (E) < 1 



(3-29) proved. 



Thus our above theorems are established; other theorems may similarly 
be derived. 



(c) Subjective Probability 

Sometimes called personal probability, this is an attempt to deal with 
events that cannot be repeated, even conceptually, and hence cannot be 
given any frequency interpretation. For example, consider events such as 
an increase in the stock market average tomorrow, or the overthrow of a 
certain government within the next month. These events are described by the 
layman as "likely" or "unlikely," even though there is no hope of estimating 
this by observing their relative frequency. Nevertheless, their likelihood 
vitally influences policy decisions, and as a consequence must be estimated 
in some rough-and-ready way. It is only then that decisions can be made 
on what risks are worth taking. 

To answer this practical need, an axiomatic theory of personal proba- 
bility has been developed. Roughly speaking, personal probability is denned 
by the odds one would give in betting on an event; we shall find this a useful 
concept later in decision theory (Chapter 15). 
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Review Problems 



3-29 A tetrah^dral (four-sided) die has been loaded. Find Pr (e 4 ) if possible, 
given the following conditions. (If the problem is impossible, state so.) 

(a) Pr (A) = .2; Pr (e 2 ) = .4; Pr (*,) = .1 

(b) Pr (4) = .4; Pr (e 2 ) = A; Pr (*,) = .3 

(c) Prfe) = .6;Pr(e 3 ) = .2 

(d) Prfe) = .7;Pr(e 2 ) = .5 

3-30 In a family of 3 children, what is the chance of 

(a) At lepst one boy ? 

(b) At lejast 2 boys? 

(c) At lepst 2 boys, given at least one boy? 

(d) At least 2 boys, given that the eldest is a boy? 

3-31 Suppose |that the last 3 customers out of a restaurant all lose their 
hat-checks, so that the girl has to hand back their 3 hats in random 
order. Wnat is the probability 

(a) That -no man will get the right hat? 

(b) That pxactly 1 man will? 

(c) That exactly 2 men will ? 

(d) That all 3 men will? 
3-32 What is t le probability that 

(a) 3 people picked at random have different birthdays? 

(b) A rodmful of 30 people all have different birthdays? 

(c) In a roomful of 30 people there is at least one pair with the same 
birthday?; 

3-33 A bag contains a thousand coins, one of which has heads on both 
sides. A coin is drawn at random. What is the probability that it is the 
loaded coin, if it is flipped and turns up heads without fail 

(a) 3 times in a row 

(b) 10 tirres in a row 

(c) 20 times in a row. 

3-34 Repeat Prpblem 3-33 when the loaded coin in the bag has both H and 
T faces, but is biased so that the probability of H is 3/4. 



chapter 4 



Random Variables and 
Their Distributions 

4-1 DISCRETE RANDOM VARIABLES 

Again consider the experiment of fairly tossing 3 coins. Suppose that 
our only interest is the total number of heads. This is an example of a random 
variable or variate and is customarily denoted by a capital letter thus: 

X = the total number of heads (4-1) 

The possible values of X are 0, 1, 2, 3; however, they are not equally 
likely. To find what the probabilities are, it is necessary to examine the original 
sample space in Figure 4-1. Thus, for example, the event "two heads" 
(X = 2) consists of 3 of the 8 equiprobable outcomes; hence its probability 
is 3/8. Similarly, the probability of each of the other events is computed. 
Thus in Figure 4-1 we obtain the probability function of X, 

The mathematical definition of a random variable is "a numerical- 
valued function defined over a sample space." But for our purposes we can 
be less abstract; it is sufficient to observe that: 

A discrete random variable takes on various values (/ . ~. 

with probabilities specified in its probability function. 

In our specific example, the random variable X (number of heads) takes on 
the values 0, 1,2, 3, with probabilities specified by the probability function 
in Figure 4-1Z?. 1 

1 Although the intuitive definition (4-2) will serve our purposes well enough, it is not always 
as satisfactory as the more rigorous mathematical definition which stresses the random 
variable's relation to the original sample space. Thus, for example, in tossing 3 coins, 
the random variable Y = total number of tails, is seen to be a different random variable 
from X = total number of heads. Yet X and Y have the same probability distribution, 

(cont'd) 
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• (T,H,T) - 
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% 


• (H, T, T) — 
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• (H,T, H) — 


H 


• (H,H,T) 
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• (H,H,H) 






Old 




sample space 


P(x) 




% 




h 




h 


- i 





P(x) 


• 0 


H 


• 1 


% 


• 2 


\> 


. 3 





New, smaller 
sample space 



(a) 



0 12 3 x 

■W 

FIG. 4-1 (a) )t ? the random variable "number of heads in three tosses." (b) Graph of the 

probability function. 

In the general case of defining a probability function, as in Figure 4-2, 
we begin by considering in the original sample space events such as (X = 0), 
(X = 1), . . . . in general {X = %); (note that capital X represents the random 
variable, and .small x a specific value it may take). For these events we cal- 
culate the probabilities and denote 2 them p(0), p{\), . , .p(%) . . . . This 
probability function p(x) may be presented equally well in any of 3 ways : 

1. Table -form, as in Figure 4- la. 

2. Graph form, as in Figure 4-16. 

3. By formula, as in Equation (4-7) given later on. 

The purpose of a random variable is clear from Figures 4-1 and 4-2: 



and anyone who. used the loose definition (4-2) might be deceived into thinking that they 
were the same random variable. In conclusion, there is more to a random variable than its 
probability function. 

2 This notation, like any other, may be regarded simply as an abbreviation for convenience. 
Thus, for example, p(3) is short for Pr (X = 3), which in turn is short for "the probability 
that the number of heads is three." Note that when X = 3 is abbreviated to 3, Pr is corre- 
spondingly abbreviated to p. 
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Pr(ei) 
Pr(e 2 ) 



Pr(e) 




• 0 


Pd) 






• X 


P(x) 



Old outcome set 



New, smaller 
set of numbers 



FIG. 4-2 A general random variable X as a mapping of the original outcome set onto a 
condensed set of numbers. (The set of numbers illustrated is 0, 1, 2, . . . , the set of positive 
integers. We really ought to be more general, however, allowing both negative values and 
fractional (or even irrational) values as well. Thus our notation, strictly speaking, should 
be rather than 0, 1, 2, . . . , x, . , . .) 



a complicated sample space (outcome set) is reduced to a much smaller, 
numerical sample space. The original sample space is introduced to enable 
us to calculate the probability function p(x) for the new space; having served 
its purpose, the old unwieldy space is then forgotten. The interesting questions 
can be answered very easily in the new space. For example, referring to 
Figure 4-3, what is the probability of 1 head or fewer? We simply add up the 
relevant probabilities in the new sample space 

Pr (X < 1) = p(0) + p(l) =| + | = | (4-2) 




FIG. 4-3 The event X < 1 in both sample spaces, illustrating the easier calculation in 

the new sample space. 



The answer 
sample 



space 
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could have been found, but with more trouble, in the original 



EXAMPLE 



In the 

changes in 
is 1, because j 



fame experiment of 3 fair tosses of a coin, let Y = number of 
the sequence. For example, for the sequence HTT, the value of Y 
there is one changeover from H to T. In Figure 4-4 we use the 



FIG. 4-4 The 




y 


P(y) 


• 0 




• l 




* 2 





random variable Y ("number of changes in sequence of 3 tosses of a coin") 
and its probability distribution. 



^on^f° ped ab ° ve t0 define this random variable and its P robabi,it y 



PROBLEMS 



In each case, tabulate the probability function of the random variable 
by first consttiuctrng a sample space of the experimental outcomes. 
4-1 In 4 fair tosses of a coin, let 

(a) X = Jiumber of heads. 

(b) Y = humber of changes in sequence. 

4-2 Let X be the total number of dots showing when two fair dice are tossed 
4-3 Two boxes each contain 6 slips of paper numbered 1 to 6. Two slips of 
paper are drawn, one from each of the boxes. Let X be the difference 



between the numbers drawn (absolute value). 
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=> 4-4 To review Chapter 2, consider the experiment of tossing 3 coins; the 
number of heads X may be 0, I, 2, or 3. Repeat this experiment 50 times 3 
to obtain 50 values of X, so that you can 

(a) Construct a relative frequency table of X. 

(b) Graph this relative frequency table. 

(c) Calculate the sample X from (2- lb). 

(d) Calculate the mean squared deviation from (2-5b). 

(e) If the experiment were repeated millionsof times, to what value would 

1. The relative frequencies tend? 

2. J tend? 

3. MSD (mean squared deviation) tend? 

4. 5 2 tend? 



4-2 MEAN AND VARIANCE 



In Chapter 3 we defined probability as limiting relative frequency. Now 
we notice the close relation between the relative frequency table observed in 
Problem 4-4 and the probability table calculated in Figure 4-1, for tossing 
3 coins. If the sample size were increased without limit, (i.e., if we continued 
to toss ad infinitum), the relative frequency table would settle down to 
the probability table. 

From the relative frequency table (Problem 4-4), we calculated the mean 
Zand variance s 2 of our sample*. It is natural to calculate analogous popula- 
tion values from the probability table, and call them the mean /u and variance 
a 2 of the probability distribution p(x), or of the random variable X itself. 
Thus 



Population mean, /* 


X 


Population variance, 


a* 4 2 (* - fpp(x) 




X 



(4-3) cf. (2- lb) 
(4-4) cf. (2-5b) 



3 Or simulate this by consulting the random numbers in Appendix Table Ila, (with an 
even number representing a head, and an odd number a tail); or e]se use the authors' 
results, as follows: 

03220 11232 11221 22213 13332 
12212 12121 11233 21112 11213 

4 Strictly speaking, we calculated the mean squared deviation, rather than s 2 . However, as 
n co, they become practically indistinguishable. 
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For our e; :ample of tossing three coins > we compute pi and o 2 in Table 4-L 5 
Note the analogy here to our calculations in Table 2-4. 

We call ft the "population mean," since it is based on the population of 
all possible tqsses of the coins. On the other hand, the mean X is called a 
"sample mean," since it is based on a mere sample of tosses drawn from the 
parent population of all possible tosses. Similarly a 2 and s 2 represent popula- 
tion and sample variance, respectively. A clear distinction between population 
and sample values is crucial; we return to this point in Chapters 6 and 7. 

Table 4-1 Calculation of the Mean and Variance of a Random Variable 



Given 


Calculation 








Easier Calculation 


Probability 


of pi from 








of <7 2 , Using 


Function 


(4-3) 


Calculation of <r M 


from (4-4) 


(4-5) 


x p(x) 


x p(x) 


(X - fj) 


(x - ftf 


(x - fif p(x) 


x 2, p(x) 


0 1/8 


- o 


-3/2 


9/4 


9/32 


0 


1 3/8 


3/8 ■ 


-1/2 


1/4 


3/32 


3/8 


2 3/8 


6/8 


+ 1/2 


1/4 


3/32 


12/8 


3 1/8 


3/8 


+ 3/2 


9/4 


9/32 


9/8 




fi = 12/8 






g 2 = 24/32 


^x 2 p(x) = 24/8 




= 1.50 






= .75 


\i 2 = 18/8 



a 2 = 6/8 



Since the definitions of [A and a 2 parallel those of X and s 2 9 we find 
parallel interpretations. We continue to think of the mean ju as a weighted 
average, usir g probability weights rather than relative frequency weights. 
The mean is also a fulcrum and center. The standard deviation is a measure of 



spread. 



5 The computai 



This formula, 



. ion of a 2 is often simplified by using: 

a 2 = ^x 2 p(x) - jn 2 (4-5) 

with its proof, is analogous to (2-7). The computation is illustrated in the 
last column of Table 4-1. 

Proof that (4-5) is equivalent to (4-4). Reexpress (4-4) as: 

a 2 = 2 (x 2 - 2/ax + M 2 ) p(x) 
and noting that ,u is a constant: 

Since J x p( x ) ~ A* anc * ^,p( x ) — 1» we have 

— ^ x2 p( x ) ~ A* 2 (4-5) proved 
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When a random variable is linearly transformed, the new mean and 
variance behave in exactly the same way as when sample observations were 
transformed in Section 2-5 (the proof is quite analogous and is left as an 
exercise). For future reference, we state these results in Table 4-2. 
We could write out verbally all the information in this table, working across 
the rows, as follows: 



Table 4-2 Linear Transformation ( Y) of a Random Variable (X) 



Random Variable 


Mean 


Variance 


Standard Deviation 


X 








Y= a +bX 


p. Y = a + bp x 


a\ = b 2 a 2 x 


o Y - |*| a x 



"Consider the random variable X, with mean /u x and variance a 2 x . 
If we define a new random variable Fas a linear function of X (specifically 
Y = a + bX), then the mean of Y will be a + bju x , and its variance 
will be b 2 a 2 x r 



PROBLEMS 

4-5 Compute // and a 2, for the probability distributions in Problem 4-1. 
As a check, compute a 2 in 2 ways— from the definition (4-4), and from 
the easy formula (4-5). 
(4-6) Compute ju and a 2 for the random variables of 

(a) Problem 4-2. 

(b) Problem 4-3. 

4-7 Letting X = the number of dots rolled on a fair die, find [x x and cr x . 
If Y = 2X + 4, calculate /u Y and a Y in 2 ways: 

(a) By tabulating the probability function of 7, then using (4-3) and 
(4-5). 

(b) By Table 4-2. 

(4-8) A bowl contains tags numbered from 1 to 10. There are ten 10's s nine 
9's, etc. Let X denote the number on a tag drawn at random. 

(a) Make a table of its probability function. 

(b) Find ju x and a x . 

4-9 A student is given 4 questions, each with a choice of 3 answers. Let 
Xhz the number of correct answers when the student has to guess each 
answer. Compute the probability function and the mean and variance 

of Jr. 
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4-10 Let A" be a random variable with mean fi and standard deviation cr. 



What are the mean and standard deviation of Z, where Z = 
(This introduces section 4-5.) 



X- fx 



4-11 Suppose that the whole population of American families yields the 
ng table for family size. (For simplicity, the data is slightly 
by truncating at 6.) 



follow 
alterec 

No. Chi 



dren 0 



Proportion 
of families 



.43 .18 



.17 



.11 .06 .03 .02 



Source. Statistical Abstract of U.S., 1963, p. 41. 

(a) Let X be the number of children in a family selected at random. 
(This selection may be done by lots: imagine each family being 
represented on a slip of paper, the slips well mixed, and then one slip 



drawn 



The probability function of Zis given in the table, of course. 
Find (jb x and a x . 

(b) Now let a child be selected at random (rather than a family), and 
let Y be the number of children in his family. (This selection may be 
done f|y a teacher, for example, who picks a child by lot from the 
registej- of children.) What are the possible values of Y! Complete the 
probability table, and compute jx Y and a T - 

(c) Is jk x or fi r more properly called the "average family size" ? 



4-3 BINOMIAL DISTRIBUTION 

There are many types of discrete random variables. We shall study 
one — the binomial — as an example of how a general formula can be developed 
for the probability function p{x). 

The classical example of a binomial variable is 

X — number of heads in n tosses of a coin 

In order to generalize, we shall speak of n independent "trials," each resulting 
in either "suicess" or "failure," with respective probabilities ir and (1 — it). 
Then the to ;al number of successes X is defined as a binomial random 
variable. 

There ai^e many practical random variables of this type, some of which 
are listed in Table 4-3. We shall now derive a simple formula for the probability 
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Table 4-3 Examples of Binomial Variables 



"Trial" 


"Success" 


"Failure" 


77* 


n 


X 


Tossing a fair coin 


Head 


Tail 


1/2 


n tosses 


Number of 
heads 


Birth of a child 


Boy 


Girl 


Practically 
1/2 


Family size 


Number of 
boys in 

family 


Throwing 2 dice 


7 dots 


Anything 
else 


6/36 


n throws 


Number of 
sevens 


Drawing a voter in 
a poll 


Democrat 


Republican 


Proportion of 
Democrats 
in the 
population 


Sample size 


Number of 
Democrats 
in the sample 


The history of one 
atom which may 
radioactively 
decay during a 
certain time period 


Decay 


No change 


Very small 


Very large, 
the number 
of atoms in 
the sample 


Number of 
radioactive 
decays 



function p(x). First, consider the special case in which we compute the 
probability of getting 3 heads in tossing 5 coins (Figure 4-5a). Each point in 
our outcome set is represented as a sequence of five of the letters S (success) 
and F (failure). We concentrate on the event three heads (X = 3), and show 
all outcomes that comprise this event. In each of these outcomes S appears 



(SSSSS) 
(SSSSF) 
(SSSFS) 



• (SSSFF) 



3 2 



FFSSS) 



(FFFFF) 



Outcome set \ 



5 trials 



n trials 



Event 



X=3 



X=x 



x times n— jc times! 



(SSS« 
(SSS- 



•SS) 
•SF) 



(FF- 



F) 



(a) 



(b) 



FIG. 4-5 



Computing binomial probability, (a) Special case: 3 heads in 5 tosses of a 
coin, (b) General case: x successes in n trials. 
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three times, and F twice. Since the probability of S is tt and F is (1 — tt), 



The probability of the sequence 
SSSFF is 



(1 » 77) * (1 - 77) 



In general, the probability of 
S IS • • • S FF ■ ■ F 
x times n — x times 
iS 77 * 77 • * • , (I — 77) • (1 — 77) • • • 

= 77*(1 - 7r) M 

this multiplication being justified by the independence of the trials. We 
further note that any outcome in this event has the same probability. For 
example, the probability of SFSSF is 

77 * (1 — 77-) • 77 • 77 ' (1 — 77) = 77 3 (1 — 77) 2 

The same factors appear; they are only ordered differently. 

Now wejonly have to determine how many such sequences (outcomes) 
are included jn this event. This is precisely the number of ways in which the 
three S's andjtwo F's can be arranged. This number of ways is designated as 

vP 

or, in general 



(3) 



and is 6 



or CI 



(5\ ... *° S' ■. 
\3j p!(5-3)! 



10 



To summarize 
Our event 

includes 



(X- 3) 




outcomes, each with a probability 

n\\ - t$ = of Of = -h 

Hence its probability is: 
/>(3)-Pr(*-43)- Qrr>(l 



5! 
3T2! 



Lit 



\xj x\ (n ~- x)\ 



(X = x) 

0 



7T»(1 - 1T) n - 



p(x) = ~ *) n - x (4-7) 



We summarize With Figure 4-6. 

_ — l 

6 This formula is developed as follows. Suppose we wish to fill five spots with five objects, 
designated S 1? S 2 , S 3 , F lf F 2 . We have a choice of 5 objects to fill the first spot, 4 the second, 



and so on; thus the 



number of options we have is: 5A3.2.1 = 5! 



(4-6) 

{cont'd) 
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Prfe) | e 



(l-nT 

7r(l 

7T(l-7rr 
Trd-Trr 1 



7r x (l-7r)" 



(FF FF) 

(FF FFS) 

(FF FSF) 

(FF SFF) 

(SF FFF) 



(SS-SSFF-F) 



(FF-FSS-SS) 



(SSSS SS) 




X 


P(x) 


• 0 




• 1 


n7r(l-7r) n " 1 


• 2 




• X 




• n 


7T n 



FIG. 4-6 Computing binomial probability of x successes in n trials. 



As a final example, we return to our previous experiment of tossing 
three fair coins. What is the probability of two heads? Each toss is an in- 
dependent trial in which tt = 1/2. Noting also that n == 3 and x = 2, we have 



Pr {X 



l) 2 (l) 3 - 2 = 



3! 
2! 1 



(I) 3 = 



(4-8) 



a confirmation of a previous result. 



But this is not the problem at hand; in our case we cannot distinguish between S ls S 2 , 
and S 3 — all of which appear as S, Thus many of our separate and distinct arrangements 
in (4-6), (e.g. S-^SgF^ and S 2 S 1 S 3 F 1 F 2 ) cannot be distinguished, and appear as the single 
arrangement SSSFF. Thus (4-6) involves serious double counting. How much? 

We double counted 3.2.1 = 3! times because we assumed in (4-6) that we could dis- 
tinguish between S x , S 2 , and S 3 when in fact we cannot. (3 ! is simply the number of distinct 
ways of arranging S 1? S 2 , and S 3 .) Similarly, we double counted 2.1 = 2! times because we 
assumed in (4-6) that we could distinguish between F 2 and F 2 , when in fact we cannot. 
When (4-6) is deflated for double-counting in both these ways, we have 

ft---— 



3!2! 3! (5 - 3)! 
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PROBLEMS 

Note tr at as well as the complete binomial distribution p(x) are 

tabulated in Table III of the Appendix, for your optional use. 
4-12 (a) Construct a diagram similar to Figure 4-6 to obtain the probability 
function for the number of heads X when 4 coins are tossed; use 
general it, 

(b) Then set tt — \, to obtain the results for a fair coin. 

(c) Ijrom (b), calculate ji and a 2 . 

(d) Graph the probability function of (b), showing fi. 

4-13 A bajll is drawn from a bowl containing 2 red, 1 blue, and 7 black 
balls. The ball is replaced, and a second ball is drawn, and so on until 
3 balls have been drawn (sampling with replacement). 

(a) Let X = the total number of red balls drawn. Tabulate its proba- 
bility function. Find /a and a 2 . Graph. 

(b) Repeat (a), for Y = the total number of blue balls drawn, 

4-14 Checjk the probability function of Problem 4-9 using the formulas of 
this Section. 

(4-15) In refiling 3 dice, let Xbe the number of aces that occur. Tabulate the 

probability function of X. Find fx and a 2 . Graph. 
4-16 On a blind toss of a dart, suppose the probability of hitting the target 

is 1/5. What is the probability that in 6 tosses you will hit the target 

exacjly 2 times? At most 2 times? At least 3 times? 
=> 4-17 On. the basis of these questions, can you guess the mean of a general 

binojnial variable, in terms of n and 77? Can you guess the variance? 

(This leads into Chapter 6-6.) 
*4-18 x (For calculus students only, leading into section 4-5). Graph the 
function /(r) — e~ t2j2 , showing its 

(a) Symmetry. 

(b) Asymptotes. 

(c) Maximum. 

(d) Points of inflection. 

4 4 CONTINUOUS DISTRIBUTIONS 

In Chapter 2 we saw how a continuous quantity such as height was best 
graphed wi;h a relative frequency histogram. The histogram of heights of 



1 Starred problems are optional, since they are more theoretical and/or difficult than the 
rest. 
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Figure 2-3 is reproduced in Figure 4-la below, (For purposes of illustration, 
we measure height in feet, rather than inches. Furthermore, the y-axis has 
been shrunk to the same scale as the a-axis.) Note that in Figure 4-la relative 
frequency is given by the height of each bar; but since its width (or base) is 
1/4, its area (height times width) is numerically only 1/4 as large. Thus we 
can't use area in this figure to represent relative frequency, since it would 
badly understate. In fact, if we wish area to represent relative frequency each 



1.00- 
0.75 



5> 0.50 ■ 



0.25 




Height (ft) 



c 1.00 

5*0.75 
c 
o 

g- 0.50 - 
* 0.25 



Unit square, 
Area » l 



Relative frequency 
given by area 



(b) 



7 Height (ft) 



FIG. 4-7 Relative frequency histogram (a) transformed into relative frequency density in 

(b) making total area = 1. 



height must be increased fourfold. This is done in Figure 4-76, where the 
area of each bar is relative frequency, and the height of each bar is called 
relative frequency density. 
In general 

(relative frequency density)(cell width) = (relative frequency) 

i.e., 

area of any bar = relative frequency. 

There is but one more important observation. Tn Figure 4-la, the heights 
sum to one (the sum of all relative frequencies must be one). From the 



« 1.0 



6 

to 



LLi 1 I I I 



7 Height (ft) 



Area = relative frequency of men in interval I 
(5% to 5% ft) 

jj probability 




Height (ft) 



Area = relative frequency of men in interval / 
f « probability 



1.0 



Height (ft) 



& 1.0 




Height (ft) 



FIG. 4-8 How relative frequency density may be approximated by a probability density 
function as sample size increases, and eel! size decreases, (a) Small «, as in Fig. 4-76. 
(b) Large enough n to stabilize relative frequencies, (c) Even larger «, to permit finer cells 
while keeping rela ive frequencies stable, (d) For very large n, this becomes (approximately) 
a smooth probability density curve. 
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numerical equivalence of height in Figure 4-7 a to area in Figure 4~7b, it 
follows that the areas in Figure 4-7Z> must also sum to one. And this is a key 
characteristic of a density function in statistics : it encloses an area numerically 
equal to 1. 

In Figure 4-8 we show what happens to the relative frequency density of 
a continuous random variable as 

1. Sample size increases. 

2. Cell size decreases. 

With a small sample, chance fluctuations influence the picture. But as sample 
size increases, chance is averaged out, and relative frequencies settle down to 
probabilities. At the same time, the increase in sample size allows a finer 
definition of cells. While the area remains fixed at 1, the relative frequency 
density becomes approximately a curve, the so-called probability density 
function, which we shall refer to simply as the probability function, designated 



If we wish to compute the mean and variance from Figure 4-8c, the 
discrete formulas (4-3) and (4-4) can be applied. But if we are working with 
the probability density function in Figure 4-Sd, then integration (which 
calculus students will recognize is the limiting case of summation) must be 
used; if a and b are the limits of X, then (4-3) and (4-4) become 



All the theorems that we state about discrete random variables are 
equally valid for continuous random variables, with summations replaced by 
integrals. Proofs are also very similar. Therefore, to avoid tedious duplication, 
we give theorems for discrete random variables only, leaving it to the reader 
to supply the continuous case himself, if he so desires. 



4-5 THE NORMAL DISTRIBUTION 

For many random variables, the probability density function is a specific 
bell-shaped curve, called the normal curve, or Gaussian curve, as shown in 



p(x). 




Variance, <y 2 = (x — f£f p{x) dx 



(4-10) 



(4-9) 
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Figures 4-9 to 4-12. It is the single most useful probability function in 
statistics. Many variables are normally distributed; for example, errors that 
are made in r leasuring physical and economic phenomena often are normally 
distributed. In addition, there are other useful probability functions (such as 
the binomial) which often can be approximated by the normal curve. 



(a) Standard 

The probability 



Normal Distribution 

ility function of the standard normal variable Z is 




(4-11) 



The constant 1/V2tt is a scale factor required to make the total area 1. The 
symbols tt ard e denote important mathematical constants, approximately 
3.14 and 2.718 respectively. We draw the normal curve in 7 Figure 4-9 to 




FIQ. 4-9 (a) Standard normal curve, (b) Vertical axis rescaled. 

reach a maxirrium at z = 0. We confirm in (4-1 1) that this is so : as we move 
to the left or right of 0, z 2 increases; since its negative exponent is increasing 



7 In Problem 4-18 you may have confirmed that the graph of (4-1 1) is that shown in Figure 
4-9. 

The mathematical constant tt = 3.14 is not to be confused with the tt used in 
Section 4-3 to designate probability of success. 
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in size 5 /?(z) decreases. Moreover, the further we move away from zero, the 
more p{z) decreases; as z takes on very large (positive or negative) values, 
the negative exponent in (4-11) becomes very large and p(z) approaches zero. 
Finally, this curve is symmetric. Since z appears only in squared form, —z 
generates the same probability in (4-11) as + z. This confirms the shape 
of this standard normal curve as we have drawn it in Figure 4-9. The mean and 
variance of Z can be calculated by integration using (4-9) and (4-10); since 




this requires calculus, we quote the results without proof: 

V>z = 0 
a z = 1 

It is for this very reason, in fact, that Z is called a standard normal variable. 
Later when we speak of "standardizing" any variable, this is precisely what 
we mean: shifting it so that its mean is 0 and shrinking (or stretching) it so 
that its standard deviation (or variance) is one. 

The probability (area) enclosed by the normal curve between the mean 
(0) and any specified value (say z 0 ) also requires calculus to evaluate precisely, 
but may be easily pictured in Figure 4-10. 

This evaluation of probability, done once and for all, has been recorded 
in Table IV of the Appendix. Students without calculus can think of this as 
accumulating the area of the approximating rectangles, as in Figure 4-8c. 

To illustrate this table, consider the probability that Z falls between .6 
and 1.3, as shown in Figure 4-1 la. From Table IV in the Appendix we note 
that the probability that Z falls between 0 and .6 is .2257; similarly the proba- 
bility that Z falls between 0 and* 1.3 is .4032. We require the difference in 
these two, namely: 

Pr (.6 < Z < 1.3) = .4032 - .2257 = .1775 

In Figure 4-1 \b we consider the probability that Z falls between —1 and +2. 
Because of the symmetry of the normal curve, the probability that Z falls 




THE NORMAL DISTRIBUTION 



69 



~2 -1 



<1.3) 



0 1 2 ^ 

(a) 

FIG. 4-11 Standard normal probabilities. 



(-K Z<2) 




between 0 and 1 is identical to the probability between 0 and + 1, which is 
.3413. In this instance we add this to the probability of Z between 0 and 
2 -namely .< 772— which yields 

Pr (-1 < Z < 2) = .3413 + .4772 = .8185 

Finally, the student may confirm that the probability enclosed between one 
standard deviation above and below the mean (—1 < Z < +1) is .6826, or 
just over 2/3 of the area of the normal curve. 



PROBLEMS 

4-19 If Z is a standard normal variable, use Appendix Table IV to evaluate: 

(a) Pr r (-2<Z< +2). 

(b) Pr!(-oo <Z< 1.64). 

(c) Pr (-2.33 <Z< oo). 

(d) Pr (-2<Z). 



(e) Pr (Z < 2). 
4-20 (a) If 3 r (-z 0 < Z < z 0 ) = .95, what is z 0 ? 
(b) If Pr (~2 0 < Z < z 0 ) = .99, what is z 0 ? 



(b) General Normal Distribution 



If a random variable X has a normal probability curve, with mean ^ 
and standard deviation a, it probability function is 8 written : 



p(x) 



(4-12) 



8 To prove that (4-12) is centered at we note that the peak of the curve occurs when the 
negative expohent attains its smallest value 0, i.e., when x = /*. It may also be shown that 
(4-12) is scaled by the factor a. Finally, it is bell shaped for the same reasons given in 
part (a). 
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We notice that in the very special case in which /a = 0 and a = 1, (4-12) 
reduces to the standard normal distribution (4-11). But more important, 
regardless of what ^ and a may be, we can translate any normal variable 
X in (4-12) into the standard form (4-11) by defining: 

X - fi 



= Z 



(4-13) 



General normal variate 




FIG. 4-12 Linear transformation of any normal variable into the standard normal 

variable. 

Z is recognized as just a linear transformation of X, as shown in Figure 4-12. 
Notice that whereas the mean and standard deviation of a general normal 
variate Zcan take on any values, the standard normal variate Z is unique — 
with mean 0 and standard deviation 1 as proved in Problem 4-10. 

To evaluate any normal variate X, we therefore translate X into Z, 
and then evaluate Z in the standard normal table (Appendix Table IV). 
For example, suppose that X is normal, with $ = 100 and a — 5. What is 
the probability of getting an X value of 110 or more? That is, we wish to 
evaluate 

Pr(Z>110) (4-14) 



First (4-14) can be written equivalently 9 as 



X - 100 > 110 - 100 \ 



(4-15) 



9 Any inequality is preserved if both sides are diminished by the same amount (100) and 
divided by the same positive amount (5). 
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which, noting (4-13), is 



we evaluate 



Pr (Z > 2) 



(4-16) 



We see that (4-16) is the standardized form of (4-14), and from Table IV 



this probability to be ,0228. Moreover, the standardized form 



(4-16) allows a clearer interpretation of our original question; in fact, we 
were asking "What is the probability of getting a normal value at least two 
standard deviations above the mean?" The answer is: very small — about one 
in fifty. J ! 

As a fijial example, suppose a bolt picked at random from a production 
line has a length X which is a normal random variable with mean 10 cm and 
standard deviation 0.2 cm. What is the probability that its length will be 
between 9.9 and 10.1 cm? That is 

Pr (9.9 <X< 10.1) 

This may be written in the standardized form 



These 



9.9 - 10 X 



10 10.1 



10\ 



2 ^ .2 ~ 2 
= Pr(-.50 < Z < .50) 
= .38 



calculations confirm our earlier observation from Figure 4-12: 
although there is any number of normal curves, there is only one standard, 
normal curve. This is fortunate; instead of requiring a whole book of tables, 
we only need one (Appendix Table IV). 

PROBLEMS 

4-21 Draw a diagram similar to Figure 4-12 for both the examples solved in 

the text directly above. Shade the area being evaluated. 
4-22 If Zis normal, calculate: 

(a) Pr (4.5 < X < 6.5) where p x = 5 and a x = 1 

(b) Pr {X < 800) where (x x = 400 and a x = 200 

(c) Pr (800 < X) where p x = 400 and a x = 200 

4-23 Suppose that a population of men's heights is normally distributed 
with a| mean of 68 inches, and standard deviation of 3 inches. Find the 
proportion of the men who 

(a) Are over 6 feet 

(b) Are under 5 feet 6 inches 

(c) A *e between 5 feet 6 inches and 6 feet. 

To chfck your 3 answers, see whether they sum to 1. 
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4-6 A FUNCTION OF A RANDOM VARIABLE 

Looking again at the experiment of tossing three coins, let us suppose 
that there is a reward R depending upon the number X of heads we toss. 
Formally, we might state that R is a function of X, or 

Now let us suppose that the specific form of this function is 

r = (x- iy 

which is equally well given by Table 4-4. 

Table 4-4 Tabled Form of the Function R = (X - \ f 





Value of R 


Value of X 




0 


(0 - l) 2 = 1 


1 


(1 - l) 2 = 0 


2 


(2 - l) 2 = 1 


3 


(3 - l) 2 =4 



The values of R are customarily rearranged in order as shown in the 
third column of Table 4-5. Furthermore, the values of R have certain proba- 
bilities which may be deduced from the previous probabilities of X, (just as the 
probabilities of X were deduced from the probabilities in our original sample 



Table 4-5 Calculation of the Probability of Each R Value 
from the Probabilities of Various X Values 



(1) 


(2) 


(3) 


(4) 


(5) 


X 


p(x) 


r =g(*) 


p(r) 


rp(r) 


• 0 


1/8^ 




3/8 


0 


■ 1 


3/8^ 




4/8 


4/8 


■ 2 


3/8 




1/8 


4/8 


• 3 


1/8 - 
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space, in Figure 4-1). Thus we note from Table 4-4 that two values qf X, 
(0 or 2) give ris^ to an R value of 1 . This is indicated with arrows in Table 4-5. 
The third and fourth column in this table show the probability distribution 
of R. The last column shows the calculation of the mean of R. 

R is a random variable; although it has been "derived" from X, it has 
all the properties of an ordinary random variable. The mean of R can be 
computed fronh its probability distribution, as in Table 4-5, and is found to 
be 1.0. But if jit is more convenient, the answer can be derived from the 
probability distribution of X, as in Table 4-6. 



Table 4-6 Mean of R = (X - l) 2 , calculated from p(x) 



p(x) 



g(x) p(x) 



1 

0 
1 
4 



1/3 
3/8 
3/8 
1/8 



1/8 
0 

3/8 
4/8 



fi B =8/8 = 1.0 

It is easy to see why this works; in a disguised way we are calculating 
[i R in the same way as in Table 4-5. The first and third lines of Table 4-6 
appear together as the second line of Table 4-5. Also, the second and fourth 
lines of Table^ 4-6 correspond to the first and third lines of Table 4-5. t Thus 
Table 4-6 contains precisely the same information as Table 4-5; it therefore 
yields the same value for jx n . The only difference in the two tables is that 4-6 
is ordered according to X values, while 4-5 is ordered (and condensed) 
according to R values. 

This example can be generalized, as follows. If X is a random variable, 
and g is any! function, then R = g(X) is a random variable. (i R may be 
calculated eitjier from the probability function of R 9 or alternatively from 
the probability function of X according to 



Theorem 



4-7 NOTATION 



Some ne 
of the mean 



(4- 17a) 



n notation will help us better understand the various viewpoints 
For any random variable, X let us say, all the following terms 
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mean exactly the same thing: 10 

^ x = mean of X 
= average X 
= expectation of X 
= £"00, the expected value of X 

The term is introduced because it is useful as a reminder that it represents 
a weighted sum, i.e., 

E(X)=J,xp(x) (4-3) 
With this new notation, result (4-1 7a) can be written 

E(R)=lg(x)p(x) (447b) 

X 

Finally, we recall that R was just an abbreviation for g(X), so that we may 
equally well write (4- 17b) in an easily remembered form: 

Theorem 



E[g{X)] = 



(4- 17c) 



As an example of this notation, we may write 

£(X-/0 2 =20*-/0 2 K*) (4-18) 

By (4-4), 

E(X - [if = a 2 (4-19) 

Thus we see that a 2 may be regarded as just a kind of expectation — namely, 
the expectation of the random variable (X— fi) 2 . 

PROBLEMS 

4-24 As in Problem 4-1, let X be the number of heads when 4 coins are 
fairly flipped. 

(a) If R(X) = X 2 — 3X, find its probability function, and ju R and a 2 R . 

(b) Find E | X - 2| in 2 ways : 

(1) Using the probability function of \X — 2\ \ and 

(2) Using the probability function of X in (4-17) 

(c) Find E(X 2 ) 

(d) Find E(X — ft x ) 2 - * s this related to a x in any way? 



10 The reason for the plethora of names is historical. For example, gamblers and economists 
use the term "expected gain," meteorologists use the term "mean annual rainfall," and 
teachers use the term "average grade.*' 



(4-25) Repeat 
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4-24, letting X be the number of changes in sequence when 



4 coins are tossed. 



4-26 The tinjie T, in seconds, required for a rat to run a maze, is a random 
variable with the following probability function. 



t 


p(t) 


22 


.1 


23 


.1 


24 


.3 


25 


.2 


26 


.2 


27 


.1 



(a) Find the average time. 

(b) Suppose the rat is rewarded with 1 biscuit for each second faster 
than 25. (For example, if he takes just 23 seconds, he gets a reward 
of 2 biscuits. Of course, if he takes 25 seconds or longer, he gets no 
rewarc .) What is the rat's average reward? 



Review Problems 

4-27 In a recent presidential election, 60% of the voters went Democratic, 
40 % went Republican. If Gallup took a sample of 5 voters at random, 
find ! 

(a) Tip probability that the sample would be all Democrats. 

(b) TljLe probability that the sample would correctly forecast the 
election winner, i.e., that a majority of the sample would be 
Democratic. 

(c) In what way is a sample of 5 better than a sample of 1 ? ; 

4-28 Three! coins are independently flipped; let X = number of heads. 
Makei a table of the probability function, and find [x x and a 2 x 
assurring 

(a) T ie coins are fair. 

(b) Tie last coin is biased, coming "heads up" 3/4 of the time. 
4-29 Suppose the amount of cereal in a package cannot be weighed exactly. 

In fact, it is a normally distributed random variable, with ^ = 10.10 
oz. and a = .040 oz. On the package is claimed, "net weight, 10 oz." 

(a) "v|hat proportion of the packages are underweight? 

(b) Tp what value must the mean ja be raised in order that only 1/10 
of 1 % of the packages be underweight? 
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4-30 Eight volunteers had their breathing capacity measured before and 
after a certain treatment. The data might have looked like this: 

Breathing Capacity 
Person Before After Improvement 

A 2750 2850 +100 

B 2360 2380 +20 

C 2950 2800 -150 

D 
E 



Let us concentrate on whether a given person improves or 
deteriorates, i.e., whether the sign of the improvement is + or — . 
Supposing that treatment has no effect, on average, what is the 
probability that there will be 6 or more + signs? (Assume that 
measurements are so precise that a tie is practically impossible,) 
(4-31) A person performs a task 3 times in succession. He learns rapidly, 
so that his chance of error is 1/2 the first time, 1/4 the second time, 
and 1/6 the third time. 

We assume that he learns equally well from his successes and 
failures, so that the three trials may be considered independent. 

(a) Find the probability table and mean of X = the total number of 
errors. 

(b) What is the probability of more than 1 error? 
*4-32 (Requires calculus) 

A random variable X is continuous, and has a probability function 

p(x) = %x* 0 < x < 2 
= 0 otherwise 

(a) Graph p(x). 

(b) Find the mean, median, and mode. Are they in the order you 
expect? 

(c) Find <7 2 . 



chaptet 5 

Two Random Variables 



5-1 DISTRIBUTIONS 



The first section is a simple extension of the last two chapters. The main 
problem will bp to recognize the old ideas behind the new names. Therefore 
we outline this section in Table 5-1, as both an introduction and review. 



Table 5- 



Review of Section 5-1, Showing the Origins of the Ideas 



O 



d Idea 



Application (new terminology) 



(3-H) 



(G n H) 
applied to 

Pr (J==2n ( 7= 1) 
Pr (X = x n Y = y) in general 



Joint probability function 



P(2, 1) 

p(x,y) in general 



(5-2a) 
(5-2b) 



Pr (HjG) 



Pr(H n G) 



applied to 
Pr (X = 2jY 
Pr (X = xj Y ~ y) in general 



Pr(C) 
= D 



(3-22) 



Conditional probability function 

p{2lY = 1) 

p(xj Y » y) or p(xjy) 



Event F is independent of E if 

Pr (FjE)\= Pr(F) (3-24) 
or Pr(F n F) ! - Pr(£) Pr(F) 

(3-25) 



Variable X is independent of Y if 
p(xfy) = p(x) 
or p(x, y) = p(x)p(y) 
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TWO RANDOM VARIABLES 



(a) Joint Probability 

In the experiment of tossing a coin three times, let us define (on our 
single sample space) two random variables : 

X = number of heads 

Y = number of changes in sequence 



Table 5-2 Two Random Variables Defined on the Original 
Sample Space 



(1) 
Outcomes 



(2) 

Corresponding 
X value 



(3) 

Corresponding 
Y value 



HHH 



Shht, 



-A 



HTT 



y///////////7\ 



G 



THT ) 



TTH 



3 
2 
2 
1 
2 
1 
1 
0 



0 

1 

2 
1 
1 
2 
1 
0 



We might be interested in the probability of 2 heads and 1 change of 
sequence occurring together. As usual, we refer to the sample space of the 
experiment (in column 1 of Table 5-2), and look for the intersection of these 
two events, obtaining 

Pr(I = 2n7=l) = 2/8 (5-1) 

For convenience Pr (X = 2 n Y = 1) is abbreviated to p(2, 1) (5-2a) 
Similarly we could compute p(0, 0),/>(0, l),/>(0, 2),/?(l, 2) . . . , obtain- 
ing in Table 5-3 what is called the joint {or bivariate) probability function 
of X and Y. 

The formal definition is 

p{x, y) = Pr (X = xn Y = y) (5-2b) 

The general case is illustrated in Figure 5-1. The events X = 0, X = 1, 
X = 2 . . . form a partition of the sample space, shown schematically as a 
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Table 5-3 p{x, y), The Joint Probability of Zand 
Y in Three Tosses of a Coin. 



— V <XL\J.</ \J I 1 










\x = value of X 


0 


1 


2 




0 


1/8 


0 


0 


1/8 


1 


0 


2/8 


1/8 


3/8 


2 


0 


2/8 


1/8 


3/8 


3 


1/8 


0 


0 


1/8 


P(y) 


2/8 


4/8 


2/8 


1 V 


—> 











horizontal sliqing. Similarly, the events Y = 0, Y — 1 . . . form a partition 
shown as a vertical slicing of the sample space. The intersection of the horizon- 
tal slice X = x and the vertical slice Y = y is the ever** (X = x n F — ?/). 
Its probability is collected into p(x, y) in the table. 

This table, or specifically Table 5-3, may be graphed, but we run into 
some typographical difficulties in trying to represent 3 dimensions on a 
two-dimensioijial piece of paper. We shall suggest some possible ways to 
resolve this difficulty. First, since the outlay of the x and y in Table '5-3 is 
arbitrary, we shall change it for convenience, running x across and y up as in 
Figure 5-2a (this is the custom in analytic geometry). Then the functional 
values p(x, y) (may be plotted in the direction of an axis which we imagine 



/... /Y=y 





m 4^ 


0 1 2 . . y . . 






[ 0 


• P(0, 0) 






% 


1 '.'/•' 


/ • * 


r 1 
l 2 

tZ^_J^Y) 3 


• Pd,0) 












• / 


X 
















• 












^ New sample space 



Orig nal 
FIG. 5-1 Two 



sample space 

random variables (Jf, y), showing their sample space and joint proba- 
bility function. 
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yi y 



2 - r r 
' [// 

?0 1 2 Tk * 

(b) 




FIG. 5-2 Various graphic presentations of the bivariate probability function of Table 5-3. 
(a) Realignment of the axes, (b) p(x, y) is represented by a line segment "coming up out of 
the paper." (c) p(x y y) is represented by the size of the dot. 

coming up out of the paper, as in Figure 5-2b, or the functional value may 
be represented by the size of the dot, as in Figure 5-2c. 



2- 
1- 



1 2 3 
(a) 



(b) Marginal Probability Function 

Suppose we are interested only in X, yet have to work with the joint 
probability function of X and Y. How can we compute the probability 
function of X, for example p(2) = Pr (X = 2)? 

It appears that the probability of this event (i.e., the horizontal slice 
X — 2 in the schematic sample space of Figure 5-1) is the sum of the proba- 
bilities of all those chunks comprising it, i.e., 

p(2) = p(2, 0) + p(2, 1) + K2, 2) + K2, 3) + • • • p(2 9 y) + ■ • • (5-3) 

= 1X2,2/) (5-4) 

V 

and in general, for any given x, 
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For example, this idea may be applied to Table 5-3. We thus find, 

p(2) = 0 + f + i = -I 



and place this sum in the right-hand margin. Similarly, /?(#) is computed for 
every x 9 thus providing the whole column in the right-hand margin. This is 
sometimes calleji the marginal probability distribution of X, to describe how 
it was obtained.. But, of course, it is just the ordinary probability function of 
X (which could have been found without any reference to Y, as indeed it was 
in Figure 4-1). 

In conclusion, the word "marginal" has no specific technical meaning. 
It simply describes how the probability distribution of X may be calculated 
when another variable Y is in play ; a row sum is calculated and placed "in 
the margin." 

In an identical way we calculate p(y), the (marginal) probability distri- 
bution of Y\ th s is set out in the marginal row of Table 5-3; each element in 
this row is the sum of the column above. Finally, we note as expected, the 
exact correspondence of this marginal probability distribution of Y with the 
probability distribution of Y calculated in Figure 4-4 without any reference 
whatsoever to X. 



(c) Conditional Probability Function 

In the exariple of tossing three coins, we might wish to know the proba- 
bilities of various numbers of heads, given one change in sequence. And, in 
general, it is often of interest to know the probability distribution of X, when 
Y is given. Thip, let us suppose that Y is known to be 1. The conditional 
probability dis|ribution of X 9 given F= 1, is designated as p(x/Y= 1). 
How is it to bej evaluated ? 

Clearly, we should examine the vertical slice for Y = 1 , shown in Figure 
5-1 generally, or Table 5-3 specifically. The appropriate vertical slice for 
y= 1 appears s as the third column in Table 5-3; it is reproduced as the 
second column jin Table 5-4 below. The problem is that the joint probabilities 
in this column ^do not sum to 1, hence they cannot represent a probability 
distribution. They do, however, give us the relative probabilities of various 
lvalues. Thus, if we know Y = 1, we know that cannot be 0 or 3, but X 
values of 1 or 2 are equally probable. Intuitively, therefore, we arrive at the 
conditional prcbability distribution of X given Y = 1 as shown in the third 
column. How did we get these numbers? Since all elements in column 2 
summed to only 1/2, we simply doubled them all. The result (column 3) 
must sum to 1 ; hence it is a bona fide probability distribution. 
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Table 5-4 Derivation of the Conditional Dis- 
tribution of X, Given Y = 1 



Values of X 


p(x, 1) 


^(.t/F = 1) 


0 


0 


0 


1 


2/0 


1/2 


2 


2/8 


1/2 


3 


0 


0 




Sum - Pr (Y = 1) 


Sum = 1 V 




= piX) 






= 1/2 





Formally, doubling all elements in column 2 is justified rigorously by the 
theory in Chapter 3, where conditional probability was found to be: 

Pr(///G) = Pr(if nG) (3-22) 
Pr ( G ) repeated 

We merely substitute for G and H, events defined in terms of random 
variables, as follows: 

For H, substitute (X = 

(5-6) 

For G, substitute (7=1) V ' 

Thus 

Pr( x = a; /r = i) = Pr(X:= * ny = 1) 

Pr(y = l) 



Using new notation 



p{xjY = 1) = ^ (5-7) 



In our example, p{\) — 1/2, so that (5-7) becomes 

P {xlY=l) = 2p(xA) (5-8) 

thus justifying the doubling in Table 5-4. 
The generalization of (5-7) is clearly 

pix lY = y)=E(^ll {5 . 9) 

p(y) 

The conditional probability distribution may be further abbreviated to 
P{*lv)> giving 



, . , p(p, y) 

pi*iy) = w 



(5-10) 



Note how similar this is to equation (3-22). 
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Since the conditional distribution is a bona fide distribution, it can be 
used for example to obtain the conditional mean 



(d) Independence 



E{XjY^y) or fi X}y = 2 x P( x lv) 



(5-11) 



We defin^ the independence of 2 random variables by extending the 
concept of the independence of 2 events developed in Chapter 3. 

Definition 



^ie random variables Xand Fare called independent 
- 7 or every x and y, the events (X = x) and ( Y = ?/) are 



iff, 

independent. 



(5-12) 



The consequences are easily derived. From (3-25) we know that the 
independence of events (X = x) and ( F = y) means that 



i.e. 



>r (X = x n F = y) = Pr (X = a) Pr (F = y) 



(5-13) 



Returning to our example, we easily show that X and Y are not in- 
dependent. Foij independence, (5-13) must hold for every (x, y) combination. 
We ask whether it holds, for example, when x = 0 and y = 0? The answer 
is no; from the probabilities in Table 5-3, (5-13) is shown to be violated since 

8 7= 8 8 



PROBLEMS 



5-1 In 4 tosses of a coin, again let 

X = nun ber of heads 

Y = number of changes of sequence 
List the samplej space, and then find 

(a) The bivariate probability function; illustrate with a dot graph as in 
Figure 5-2c. 

(b) The (marginal) probability function of X. 

(c) The mean and variance of X. 

(d) The conditional probability function p{xjY = 2). 

(e) The conditional mean and variance of X, given 7=2. 

(f) Are X and Y independent? 
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5-2 Suppose X and Y have the following joint probability function 

\ v 



X \ 


2 


4 


6 


5 


.10 


.20 


.10 


10 


.15 


.30 


.15 



Answer the same questions as in Problem 5-1. 
5-3 Suppose Zand Y have the following joint distribution 



\ y 
x \ 


1 


2 


3 




0 


.1 


.1 


0 




l 


.1 


.4 


.1 




2 


0 


.1 


.1 





Answer the same questions as in Problem 5-1. 

5-2 FUNCTIONS OF TWO RANDOM VARIABLES 

In Section 4-7, we analyzed a derived random variable R which was 
some function of an (original) random variable X\ 

R^g(X) (5-14) 

In this chapter we shall analyze a derived variable T which is some function 
of a pair of random variables X, Y: 

T = g(X, Y) (5-15) 

The concepts and proofs of this section will therefore run parallel to those 
of the previous chapter, the main difference being that the joint probability 
function p(x, y) will replace the probability function p{x). 

We shall be particularly interested in the distribution and mean of the 
new variable T. 



Example 



Following our normal procedure, we develop the argument in terms of 
simple examples, and then generalize. To use our example of tossing three 
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coins, shown in Figure 5-3, suppose S is just the sum 1 of X and Y. In this 
specific case (5-15) becomes: 

S = X + Y (5-16) 

We use the symbol S in (5-16) rather than Tto emphasize that this function 
of X and Y isja very special case of (5-15), being a simple sum. 

In Figure 5-3, we show how p(s), the probability function of S may be 
derived directly from the original sample space, or indirectly by means of 
the joint probability function p(x, y). Tn either case., the result is the same. 



FIG. 5-3 



'wo views of the derivation of the probability function of S = X + Y 
(a) Directly 




Reordered 
s — x + y s 




P(s) 



1/8 
0 

2/8 
4/8 
1/8 



(b) Using the joint probability function of X and Y as an intermediate condensation 



Original sample 

. (XTT) 
• (TTH) 

(THT) 



space 



Intermediate 



Final sample space 




1 To give some motivation as to why this random variable might be of practical interest, 
we may reinterpret the tossing of 3 coins as "having 3 children," and then consider X ~ 
number of girls ajid Y = number of sex changes. Since girls are more expensive to clothe 
than boys, and since sex changes interfere with the convenient passing on of clothing 
from one child to the next child, we might interpret S = X + Y as a rough index of the 
clothing costs for ;he family. Of course, a weighted average of X and Y might be even more 
appropriate. 
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On the one hand, consider the direct derivation in Figure 5-3a. To 
illustrate, we note that four of the eight equiprobable outcomes are associated 
with S = 3. Hence p(3) = Pr(S = 3) is 4/8. Other S probabilities are 
similarly evaluated. 

On the other hand, in Figure 5-3b, p(3) may be evaluated indirectly, by 
first deriving the joint probability function p(x 9 y). Then the three circled 
(x, y) combinations all yield S = 3, and the sum of their probabilities is 4/8. 

The expectation E(S) may similarly be derived in two ways. On the one 
hand, applying (4-3) directly to the probability distribution of S, we have, 
by definition: 

E(S)=Zsp(s) (5-17) 

S 

= 0(i) + 1(0) + 2(f) + 3(|) + 4(i) 



On the other hand, we may arrive at the same result by using the joint 
distribution of X and Y. Specifically, we wonder if (4- 17c) can be extended 
to: 

e(S) = e(x + y) = 2 0* + v) p(*> y) (5-is) 

= (o + oxt) + (o + i)(0) 
+ (i + oxo) + (i + i)(l) • • • 

+ (1 + 2)(i) + (2 + l)(f) + (3 + 0)(i) 
= 2 J, the same result derived in (5-17). 

So (5-18) does, in fact, work — at least in this example. Why? The last 
3 terms of (5-18) amount to 

3(i + I +1) = 3(f) 

which is the same as the second last term of (5-17). Continuing in this fashion, 
we see that (5-18) is just a disguised form of the more condensed form 
(5-17). 

In a similar way we could prove generally 
Theorem. If T = %(X, Y) is any function of two random variables, then 

(5-19) 



e(t) = E[ g (x, Y)] - 2 y)p{*. y) 

x,y 



[compare (4-1 7c)]. 

For an example of how this works for a more complicated function of 
X and F, we return to the tossing of three coins, and consider 

T = X 2 - 2Y (5-20) 
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Following the^ method of Figure 5-3(a), we can derive the following 
probability distribution for T : 



Calculation of E(T), using (5-17) 
/ p(t) tp(t) 



-3 


1/8 


-3/8 


-1 


2/8 


-2/8 


0 


2/8 


0 


2 


2/8 


4/8 


9 


1/8 


9/8 



S = 1 V E(T) = 1 



from which Elf) is directly calculated to be 1. 

Alternatively, we could calculate E(T) from (5-19), using p{x y y) as 
5-3. Thus, noting (5-20): 



given, in Table 
E(T) 



= 2 (x 2 - 2y) p{x, y) 

x,y 

= (0 2 - 2(0))(l/8) + (0 2 - 2(1))(0) + (0 2 - 2(2))(0) 
+ (l 2 - 2(0))(U) • • • + (3 2 - 2(2))(0) 



PROBLEMS 

5-4 Let U = 



where X. 
namely 



X{X+ Y) 
V=^{X- 8)(y-4) 

and Y have the same joint distribution as in Problem 5-2, 



(a) Fine 

(b) Fine 

(c) Fine 



\ y 








X \ 


2 


4 


6 


5 


.10 


.20 


.10 


10 


.15 


.30 


.15 



the distribution of U, and from this its mean. 

the mean of U using (5-19). 

E(V). 



4 
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(5-5) Let U = XY - 1 

V = (X — l)(Y-2) 
where X and Y have the same joint distribution as in Problem 5-3, 
namely 



\ y 

X \ 


1 


2 


3 


0 


.1 


.1 


0 


1 


.1 


.4 


.1 


2 


0 


.1 


.1 



(a) Find the distribution of U, and from this its mean. 

(b) Find the mean of U using (5-19). 

(c) ¥'mdE(V). 



5-3 COVARIANCE 

This is a measure of the degree to which two variables are linearly 
related. As an example, consider the joint probability function of Table 5-5, 
graphed in Figure 5-4a, We notice some tendency for these two variables 
to move together (i.e., a large X tends to be associated with a large Y; and a 
small X with a small Y), 

Our measure of how the variables move together should be independent 
of our choice of origin. It will, therefore, be convenient in Figure 5-46 to 
translate both axes from the (0, 0) origin to jbt x and pi Y (which are calculated 
to be 3 and 3); this means defining two new variables 

X — fi x and Y — {i Y 

Now suppose we multiply the new coordinate values together, 

(X-^ x )(Y-^ r ) 



Table 5-5 Joint Probability p(x 9 y) 



X \ 


1 


2 


3 


4 


5 


1 


.1 


0 


0 


0 


0 


2 


0 


.2 


0 


.1 


0 


3 


0 


0 


.2 


0 


0 


4 


0 


.1 


0 


.2 


0 


5 


0 


0 


0 


0 


.1 



. \ 
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y 

5 
4 

2 
1 



3 

(a) 



y "My 



2U 
I 
I 

1 V- 
I 

01- 
I 
l 

-lh- 
I 
I 

-2h 
I 

o 



-2 -1 



0 

(6) 



.J. 
1 



.J. 

2 



FIG, 5-4 Translation of axes, (a) Original, (b) Axes translated to the center of the dis- 
tribution. 

For any poim: in quadrant I in Figure 5-46 both its (X — /u x ) coordinate 
and ( Y — ju r ) coordinate will be positive ; hence this product will be positive. 
It will also be Positive for any point in the third quadrant, since both factors 
are negative. But for points in the other two quadrants the product is nega- 
tive. If we sum all of these, attaching the appropriate probability weights to 
each, i.e. 

X V 

this gives us a good measure of how the variables move together, and is in 
fact a XY9 tne covariance of X and F." 

In our example, the heavier probability weights appear in quadrants I 
and III; thus the positive terms in this calculation will outweight the negative. 
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Consequently, covariance will be positive, 2 indicating, as expected, some 
tendency for the variables to move together. Alternatively, if the larger 
probabilities had occurred in quadrants II and IV, covariance would be 
negative, indicating the tendency for Zand Y to move in opposite directions. 
Finally, had the probabilities been evenly distributed in the four quadrants, 
there would be no discernible tendency for X and Y to move together and, 
as expected, their covariance would be zero. 

We notice that (5-21) is equivalent to the following formal definition. 3 



Definition. 



Covariance of X and Y> 
c XY = E(X - fx x ){Y - fi Y ) 



(5-22) 



2 Calculated as follows: 

c XY = (-2)(-2)(.l) + (-l)(_l)(.2) + (-!)(+ + 

+ (+1)(+1)(.2) + (+2)(+2)(.l) = +1.0 

3 The computation of a XY may often be simplified by using 

a xjr = E{XY) - ft xf jL Y (5-23) 
This formula, with its proof, is analogous to (4-5): 

a x = E(X*) - p* x (5-24) 
Proof of (5-23): beginning with (5-21), 

g xy = 22 ~~ t*x>(y - my) p( x > v) 

x y 

= 22 fry - x py - y^x + pxPy* p&> ^ 

X V 

- 22 *y p&> ^ ~ 22 x p&> y) ~ px 22 y p&> y) + pxpy 22 y ^ 

xy a- y xy 

(5-25) 

In the second term, we find that 

22 x p( x > & = 2 x [2 p( x * yf\ 

x y a; 1 1/ J 

and by (5-5), = ^ *p(x) 

X 

= M X (5-26) 

Similarly, in the third term of (5-25), 

25>/>(*,2/)=,»r ( 5 " 27 > 

x y 

Finally, in the last term of (5-25), 

22/^'^ = 1 

x y 

Thus (5-25) reduces to 

a XY = 22 xy P&> y) ~~ PY&X) ~ A*jy0*r) + t*Xl*Y 

x y 

= E{XY) - ti x p T (5-23) proved 
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The variance of X [ref. (4-19)] is recognized as just a special case of this, 
being the covariance of X with itself. 

Since a% Y measures the extent to which the two variables move together, 
we find it plausible (indeed, it may be proved 4 ) that 

Theorem. 



If Xand Fare independent, a XY = 0 



(5-28) 



PROBLEMS 

5-6 For the following joint probability table, 











•X 


0 


1 


2 


0 


.2 


.2 


0 


1 


0 


.4 


.2 



Calculate &xjt 

(a) From the definition (5-22). 

(b) From the easier formula (5-23). 
Problem 5-6, for the following joint probability distribution: 



(5-7) Repeal 



X \ 


0 1 


2 




0 




.4 


J 


1 








2 


.2 .2 . 




e- 
'"I 



5-8 Suppose Zand F have the following joint distribution: 



* Proof. If Zand 
Thus (5-21) becomes 









•X 


1 


2 


1 


.40 


.10 


2 


.20 


.05 


3 


.20 


.05 



Fare independent, then 



7 XY 22 ~ A*x)fo ~ f*Y)p(x)p(y) 

[J (ar - p x )p(x)] [2 (2/ ~ i»j0 Z 7 ^)! 
- 0.0 = 0 



(5-13) repeated 



(5-28) proved 
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(a) Find p(x) and p{y)\ then by verifying that p(x)p(y) =p(x 9 y) 
confirm that X and 7 are independent [ref. equation (5-13)]. 

(b) What is a XY l 

5-9 (a) Referring to Problems 5-4 and 5-5, is it true that E(V) = a XY l 
(b) Referring to Problem 5-1, find 

(1) *xy 

(2) E(X + 7). 

*5-10 













0 


1 


2 


0 


.1 


.3 


.1 


1 


.2 


.1 


.2 



(a) Find the probability function of X and the probability function 
of 7. Compute E{X) and E(Y). 

(b) Are X and 7 independent? 

(c) Calculate o XY . 

(d) Which statements are true, for any X and 7? 

(1) If Zand 7 are independent, then <j xy must be zero. 

(2) If a XY = 0, then X and 7 must be independent. 

=> 5-11 In a certain gambling game, a pair of honest three-sided dice are 
thrown. Let 

X x = number on first die 

X 2 = number on the second die 

The joint probability distribution of X 1 and X 2 is, of course 





1 


2 


3 


i 


1/9 


1/9 


1/9 


2 


1/9 






3 









The total number of dots S is: 

s = X 1 + x 2 

(a) Find the distribution of S, and its mean and variance. 

(b) Find the mean and variance of X x and X z . 

(c) Do you see the relation between (a) and (b)? 



5-12 Suppose the gambling game of Problem 5-11 is complicated by 
using loaded dice, as follows: 
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P(*i) 


as, 


/>(*2) 


1 


A 


1 


.5 


2 


.3 


2 


.4 


3 


.3 


3 


.1 



Assuming that the dice are tossed independently, tabulate the joint 
distribution of X x and X 2 , and then answer the same questions as in 
Problem 5-11. 



5-4 LINEAR COMBINATION OF TWO RANDOM VARIABLES 
(a) Mean 

First, we take leave of more complicated functions, and return to the 
simple example of Section 5-2 in which S was just the sum of X and Y. 
When we calculated E(S) the student's suspicions may have been aroused; 
the mean of S (2^>) turned out to simply the sum of the mean of X, (l}4) 
and the mean of Y, (1). Moreover, this was exactly the conclusion in the 
problems. In fact, for any X and Y, it may be proved 5 that 



Theorem. 



E(X + Y) = E(X) + E(Y) 



(5-29) 



Mathematicians often refer to this important property as the "additivity" 
or "linearity" of the expectation operator. It may be easily generalized to 



cover the case of a "weighted sum" 

W=aX+bY 



'Proof For S 



(5-30) 



X + Y, (5-19) becomes 

e(x + y) = 22 (* + f) p(*> y) 



= 22 x p&> y) + 22 yp( x > ^ 

xv x y 

Considering the first term, we may write it as 

22 x p( x > y ) = 2 x [2 p( x > 

xv x y 

by (5-5) =J,xp(x) 

X 

= E(X) 

Similarly the seajnd term reduces to E(Y), so that 

E(X+ Y) — • E(X) + E( Y) 



(5-29) proved 
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where a and b are any two constants. Wis also known as a "linear combina- 
/tion of X and 7." For example, S = X + 7 is just the special case in which 
a = b = 1. As another example, the average of two random numbers Zand 
7 is (X + 7)/2 = ^2 X + /2 ^ which is just a weighted sum with weights 
1/2. Similarly, any weighted average is just a linear combination with a and 
ft satisfying a + b = 1. 

We might guess that if we know the average of X and the average of 7, 
we might plug these into (5-30) to find the average of W. Fortunately this 
simple operation is always justified; thus 6 



. Theorem, 



E(W) = E{aX + 67) = aE(X) + bE( 7) 



(5-31) 



As a review, the student should compare (5-19) and (5-31). Both provide 
a means of calculating the expected value of a function of Zand 7. However, 
(5-19) applies to any function of Zand 7, whereas (5-31) is restricted to linear 
functions only. When we are dealing with this restricted class of linear 
functions, (5-31) is generally preferred to (5-19) because it is much simpler. 
Whereas evaluation of (5-19) involves working through the whole joint prob- 
ability distribution of Zand 7 (e.g., Table 5-3), (5-31) requires only the mar- 
ginal distributions of A' and 7 (e.g., the last row and column of that table). 

(b) Variance 

Again, we consider a simple sum first, and any linear combination 
later. The variance of a sum is a little more complicated than its mean. It 
may be proved 7 that 



Theorem. 



var (Z + 7) = var Z + var 7 + 2 cov (Z, 7) 



(5-32) 



6 Since the proof parallels the proof of (5-29), it is left as an exercise. 

7 Proof. It is time to simplify our proofs by using brief notation such as E(W) rather than 
the awkward J w pW, or the even more awkward T Y w(x t y) p(x, y). First, from 
(4-19), ~ » y 

var S = E(S - /x s f 

Substituting for S and jn s , 

var S = E[(X + Y) - (p x + n Y )f 
= E[(X-fi x ) + (Y-ft Y )? 

■= E[(X-ji x f + 2( X - p x )(Y - py ) + (Yj-^r?] 

each of these is a random variable 
Realizing that (5-31) holds for any random variables, 

var S = E(X — [i x ) 2 + 2E(X - /li x )(Y - + £(7 - pt y f 

— var + 2 cov (JT, 7) + var 7 (5-32) proved 
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where var X rnd coy (X, Y) are alternate notations for a x and a XY re- 
spectively, Arj interesting simplification occurs when X and Y have zero 
covariance (are "uncorrected"); this occurs whenever X and Y are inde- 
pendent, for jexample in the dice Problems 5-11 and 5-12. Then (5-32) 
simplifies to: 



Corollary 



If X and Y are uncorrected, 
var {X + Y) = var X + var F 



(5-33) 



Finally, (5-32) may be generalized to any linear combination. 8 
Theorem. 



var; {aX + bY)= a 2 var X + b 2 var y + lab cov (X, 7) 



(5-34) 



This and the other theorems of this section are summarized in Table 5-6, a 
very important table for future reference. The general function g(X 9 Y) is 
dealt with in the first row, while the succeeding rows represent increasingly 
restricted speci il cases. 



Table 5-6 Summary of the Mean and Variance of Various Functions 
of the Random Variables X and Y 



Function of 
Xand y 



Mean and 
Variance 
Derived by : 



Mean 



Variance 



1. Any function 
g(X, Y) 



E[g(X, Y)] 

x>v (5-19) 



2. Linear combina- 
tion aX + bY 



Row 1 



E{aX + bY) 

= aE(X) + bE(Y) 
(5-31) 



var (aX + bY) 

= a 2 var X + b 2 var Y 
+ lab cov (X, Y) 

(5-34) 



3. Simple sum 
X+ Y 



Setting 
a = b * 1 
in row 2 



«£(*) + £( y) 

(5-29) 



var(X+ y) 

— var X + var y 
+ 2 cov (X, y) 
(5-32) 



4. Function of one 
variable, aX 



Setting b = 0 
in row 2 



(ref. Table 4-2) 



var (aA') = « a var A' 

(Table 4-2) 



8 Since the proof parallels 
has a corollary 



the proof of (5-32), it is left as an exercise. Note also that (5-34) 
similar to (5-33). 
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Example 

Suppose we choose a family at random from a certain population, 
letting 

B = number of boys in the family 
G = number of girls in the family 

so that C = B + G = number of children. 
Suppose it is known that 

E(B) = 1.2 var£ = 2.0 

E(G)= I A varG = 2.2 

cov (B, G) = 0.3 

Then we can calculate the average number of children, and the variance: 
From (5-29) 

E(C) = 1.2 + 1.1 = 2.3 

From (5-32) 

var (C) = 2,0 + 2.2 + 2(0.3) = 4.8 

PROBLEMS 



5-13 Continuing Problems 5-11 and 5-12, suppose the pair of 3-sided dice 
are not only loaded, but dependent, so that the joint probability 
function of the 2 numbers is 



/ 



\X 2 

x 1 V 


I 


2 


3 
















-i 


1 


.1 


.1 


.1 




\ ^ 


2 


.1 


.1 


.2 






3 


.1 


.1 


.1 




A 



(a) Find the distribution of S (the total number of dots), and its 
mean and variance. 

(b) Find the mean and variance of X x and of X 2 . 

(c) Find the co variance of X x and X 2l and then verify that (5-29) 
and (5-32) hold true. 

(5-14) When a coin is fairly tossed 3 times, let 

X = number of heads on the first two coins 
Y = number of heads on the last coin 
Z = total number of heads 
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ance. 



(a) Are Zand Y independent? What is their co variance? 

(b) For each of X, Y, and Z, find the distribution, mean, and vari- 



(c) Verify that (5-29) and (5-32) hold true. 

(5-15) Repeat Problem 5-14 for a coin (Problem 3-26) which is not fairly 
tossedj, having in fact the following sample space: 

e Pr(<?) 



c . (H H H) > 


.15 


. (H H t) ' 


.10 


. (H T H) 


.10 


. (H T T) 


.15 


. (T H H) 


.15 


. (T H T) 


.10 


. (T T H) 


.10 


. (T T T) 


.15 



5-16 The students of a certain large class wrote 2 exams, each time ob- 
taining a distribution of grades, with the following characteristics: 



Class 


Standard 






Mean 


Deviation 


Variance 




M 


a 


a 2 




50 


20 


? 


covariance 


80 


20 


7 


or 12 = 50 


? 


? 


? 





1st exam, X l 
2nd exam, X 2 

(a) Average, X 

(b) 4/eighted 
average W 



Fill in the blanks in the table, assuming 

(a) The instructor calculated a simple average of the two grades, 
X = (X x + X 2 )/2 

(b) The instructor thought the second exam was twice as important, 
so took a weighted average 

w = hx, + %x 2 

5-17 Repeat Problem 5-16, if the covariance is —200. How might you 
interpret such a negative covariance ? What has it done to the variance 
of the Average grade? 
(5-18) Repeat Problem 5-16, if the covariance is 0. 
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Review Problems 

5-19 If X and Y have the following joint probability function 



\ y 








X \ 


5 


6 


7 


5 


.1 


.3 


.1 


6 


.1 


.1 


.3 



Find the probability distribution and mean of 

(a) X. 

(b) Y. 

(c) The sum S = X + Y. 

(d) 7, given X = 5. 

(e) Are X and Y independent? Briefly, why? 

(f) Find Pr(Z < Y). 

5-20 In a small community of ten working couples, yearly income (in 
thousands of dollars) has the following distribution: 



Couple 


Man's Income 


Wife's Income 


1 


10 


5 


2 


15 


15 


3 


15 


10 


4 


10 


10 


5 


10 


10 


6 


15 


5 


7 


20 


10 


8 


15 


10 


9 


20 


15 


10 


20 


10 



A couple is drawn by lot to represent the community at a con- 
vention. Let M and W be the (random) income of the man and wife 
respectively. Find: 

(a) The bivariate probability distribution, and its dot graph. 

(b) The probability distribution of M; also (x^i 

(c) The probability distribution of W\ also p w and af v . 

(d) The covariance o mv . 
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(e) EQVjM = 10), E(W\M = 20). Note that as M increases, the 
conditional mean of W increases too. This is another expression of the 
"positiye relation" between M and W. 

(f) If C represents the total combined income of the man and wife, 
what is its mean and variance? 

(g) What is Pr (C>25)? 

(h) If income is taxed a straight 20 percent, what is the mean and 
variance of the tax on a couple's income? 

(i) If the income of a couple is taxed according to the following 
progressive tax table, what is the mean and variance of the tax? 



Tie 
Tie 



For a 
(a) 
(b) 

(c) T 

(d) T 

(e) E 



Combined Income 


Tax 


10 


1 


15 


2 


20 


3 


25 


5 


30 


7 


35 


10 


40 


13 



(5-21) Ten people in a room have the following heights and weights 
Person Height (inches) Weight (pounds) 



A 

B 

C 

D 

E 

F 

G 

H 

I 

J 



70 
65 
65 
75 
70 
70 
65 
75 
75 
70 



150 
140 
150 
160 
150 
140 
140 
150 
160 
160 



person drawn by lot (with height H and weight W), find: 
bivariate probability distribution, and graph it. 
probability distribution of H, and its mean and variance, 
le probability distribution of W, and its mean and variance, 
le co variance, a HW . 

W\H = 65), E(WIH = 70), E(W/H = 75). 
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(As height increases, the conditional mean weight increases, which is 
another view of the positive covariance of H and W.) 

(f) Are H and W independent? 

(g) If a "size index" / were defined as 

/ = 2H + 3W 

find the mean, variance and standard deviation of /; then find the 
distribution of /and verify directly. 
5-22 Suppose a game involves dropping 3 coins on the table — a nickel, a 
dime, and a quarter. Each coin that lands "heads up" you are allowed 
to keep, so that the possible reward R ranges from 0 to 

(a) List the sample space. 

(b) What is the distribution of R, its mean, and variance? 

We shall now work through an alternate way to find the mean and 
variance of R, without going to the trouble of finding its exact distribu- 
tion. To begin with, let us define 

X ± = the nickel's contribution to the reward 
X 2 = the dime's contribution to the reward 
X z — the quarter's contribution to the reward 

Thus 

R = X x + X 2 + X z (5-35) 

(c) What is the distribution of X u its mean, and variance? 

(d) Similarly find the distribution, mean, and variance of X 2 and X 3 . 

(e) Apply (5-29) and (5-33) to find E(R) and var (R). 

(5-23) Continuing, suppose that instead of 3 coins, there were 4 coins 
dropped on the table — a nickel, a dime, and 2 quarters. Answer the 
same questions as in Problem 5-22. 
5-24 Continuing Problem 5-22, suppose that instead of 3 coins, we dropped 
3 nickels, 2 dimes, and 5 quarters. What is the range, mean, and 
variance of Rl 

=> 5-25 A bowl contains 6 chips numbered from 1 to 6. One chip is selected 
at random and then a second is selected (random sampling without 
replacement). Let X 1 and X 2 be the first and second numbers drawn. 

(a) Tabulate the joint probability function of X x and X 2 . 

(b) Tabulate the (marginal) probability functions of X x and X 2 , 

(c) Are X x and X 2 independent? 

(d) What is the covariance of X x and X 2 1 

(e) Find the mean and variance of X 1 and X 2 . 

(f) Find the mean and variance of S = X 1 + X 2 in two different ways. 
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5-26 Repeat Problem 5-25 with the following change. The first chip is 
drawA and recorded, then replaced in the bowl before the second is 
drawn (random sampling with replacement). Isn't this sampling 
problem (with replacement) mathematically identical to tossing a die 
twice? 

5-27 Let Y be the total number of dots showing when 10 fair dice are 
tossed. 

(a) What are the mean and variance of Yl 

(b) W nat is the range of possible values of F? 
5-28 | 

(a) A bowl contains 50 chips numbered 0, and 50 chips numbered 1. 
A sample of two chips is drawn with replacement; the sum is denoted 
by S. Tabulate the probability function of S. What are the mean and 
variance of SI 

(b) Repeat for a sample of three chips. 

(c) Repeat for a sample of five chips. 

(d) Do you recognize the probability functions in (a), (b), and (c)? 



chapter 6 

Sampling 



6-1 INTRODUCTION 

In the last three chapters we have analyzed probability and random 
variables; we shall now employ this essential theory to answer the basic 
deductive question in statistics; "What can we expect of a random sample 
drawn from a known population?" 

We have already met several examples of sampling: the poll of voters 
sampled from the population of all voters; the sample of light bulbs drawn 
from the whole production of bulbs; a sample of men's heights drawn from 
the whole population; a sample of 2 chips drawn from a bowl of chips 
(Problem 5-25). All of these are sampling without replacement; an individual 
once sampled, is out. Since he is no longer part of the population he cannot 
appear again in the sample. On the other hand, sampling with replacement 
involves returning any sampled individual to the population. The population 
remains constant; hence any individual may appear more than once in a 
sample, as in Problems 5-26 and 5-28. Polls of voters are typically samples 
without replacement; but there is no reason why a poll could not be taken 
with replacement. Thus no record would be kept of those already selected, 
and, for example, John Q. Smith of Cincinnati might vote twice in the poll — a 
privilege he will not enjoy on election day. 

As defined earlier, a random sample is one in which each individual in 
the population is equally likely to be sampled. There are several ways to 
actually carry out the physical process of random sampling. For example, 
suppose a random sample is to be drawn from the population of students 
in the classroom. 

1. The most graphic method is to put each person's name on a card- 
board chip, mix all these chips in a large bowl and then draw the sample. 

2. A more practical method is to assign each person a number, and then 
draw a random sample of numbers. Thus for a population of less than a 
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hundred, 2-digit numbers suffice. A random 2-digit number may be obtained 
by throwing a 10-sided die twice, or by consulting a table of random numbers 
(Appendix Table II) and reading off a pair of digits for each individual 
required in trp sample. 

These twp sampling methods are mathematically equivalent. Method 2 
is simpler to ^employ, hence it is used in practical sampling. However, the 
first method conceptually easier to deal with and to visualize; consequently 
in our theoretical development of random sampling, we talk of drawing 
chips from a bowl. Moreover, if we are studying men's heights, then the 
height alone is all that is required on the chip and the man's name is irrelevant. 
Hence we canj view the population simply as a collection of numbered chips 
in a bowl, which is stirred and then sampled. 

How can. random sampling be mathematically specified? If we draw one 
chip at random, its number can be regarded as a random variable taking on 
values that range over the whole population of chip values, with probabilities 
corresponding to the relative frequencies in the population. 

As an example, suppose a population of 80 million men's heights has 
the frequency 'distribution shown in Table 6-1. For future reference, we also 
compute ii arid a 2 from Table 6-1, and call them the mean and variance 
of X, where X represents the parent population of men's heights. 



Table 6-1 A Population of Men's Heights 1 



(1) 



(2) 



(3) 



Height 




Relative 


3oint of cell) 




Frequency, 


X 


Frequency 


also p(x) 


51 


825,000 


.01 


54 


791,000 


.01 


57 


2,369,000 


.03 


60 


5,505,000 


.07 


63 


9,483,000 


.12 


66 


16,087,000 


.20 


69 


20,113,000 


.25 


72 


14,480,000 


.18 


75 


7,891,000 


.10 


78 


1,633,000 


.02 


81 


823,000 


.01 




2 = 80,000,000 


2 = 1.00 



1 We approximatej each height by the cell midpoint to keep concepts simple. To be more 
precise, we ought to have used a very fine subdivision of height into many cells, as in Figure 
4-8c. 
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From (4-3): 

ju = 51001) + 54(.01) • ■ • 81(.01) = 67.8 

From (4-4): 

cr 2 = (51 - 67.8) 2 (.01) + (54 - 67.8) 2 (.01) • • • + (81 - 67.8) 2 (.01) = 28.4 
a = 5.3 

Random sampling from this population is equivalent mathematically to 
placing the 80 million chips of column 2 in a bowl with each chip carrying 
the x value shown in column 1. The first chip selected at random can take on 
any of these x values, with probabilities shown in column 3. This random 
variable we designate as X x \ the second draw is designated as the random 
variable X 2 , and so on. But each of these random variables X u X 2 , . . . X n 
(together representing our sample of n chips) has the same probability dis- 
tribution p(z), the distribution of the parent population; that is 2 

pi x i) = pfa) = p(*k) = * ' ■ P(%n) (6-1) 

This equality, of course, holds true if we sample with replacement, since the 
second chip is drawn from exactly the same bowlful as is the first chip, etc. 
Fortunately, (6-1) also holds true for sampling without replacement, even 
though X x , X>, . . . X n are now dependent; since this is not at all obvious, we 
must show why. 

We have already noted that the distribution of X x is the same as the 
distribution of the population. However, the conditional distribution of X 2 
given X x is not the same. Once that first sample value has been taken from the 
population (and not replaced), the population changes 3 , along with relative 
frequencies (probabilities). Thus X 2 is dependent on X x \ or to restate, the 
conditional distribution of X 2 will depend on the value of X x selected in the 
first draw. But this is not the issue in (6-1). In that equation p{x 2 ) is the 
distribution of X 2 , which is not the conditional distribution, but rather the 
marginal distribution of X 2 — without any condition, i.e., without any knowl- 
edge of X v And if we have no knowledge of X x and consider the distribution 
of X 2) there is no reason for it to differ from the distribution of X v 

Our intuition in this case is a good guide. We could formally confirm 
this result by considering the full table showing the joint probability function 
of X x and X 2 . It is symmetric around its main diagonal; hence although 
conditional distributions (rows or columns) vary in this table, the marginal 

2 Strictly speaking, (6-1) is not precise enough. It would be more accurate to let p x denote 
the probability function of X lf p 2 of X 2 , etc, and then write 

where = means "identically equal for all 

3 In our example, with a population of 80 million heights, this change would be of no 
practical consequence. But with smaller populations it might. 
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distributions of X x and of X 2 are necessarily identical. (See Problem 5-25b.) 
Thus equation (6-1) holds true, even in the case of sampling without replace- 
ment. 

Before leaving Table 6-1, we have one further observation. When the 
parent population is extremely large, such as 80 million, sampling without 
replacement is practically the same as sampling with replacement. It hardly 
matters whether the individual sampled is replaced in the population or 
not — one individual hardly changes the frequencies in column 2 or the relative 
frequencies ii column 3. 

Thus the second draw (X 2 ) will be practically independent of the first 
(A^). This leads us to the conclusion that sampling without replacement from 
an infinite population is equivalent to sampling with replacement; this is 
important eijough that we shall return to it in Section 6-5. 

Conclusion. Any population to be sampled may be simulated by a 
bowl of chip's, with the following mathematical characteristics: 

1. The jaumber on the first chip drawn is a random variable X u with a 
distribution identical to the distribution, of the population random variable X. 

2. The sample of n chips gives us n random variables (X l9 X 2 , . X n ). 
Each X i ha| the same (marginal) distribution — that of the population X. 
This fundamental characteristic (6-1) holds in all cases — regardless of sample 
replacement or population size. However, the independence of X l9 X 2i . . . X n 
is a more complex issue. If the population is finite and sampling is without 
replacement^ then the X, are dependent, since the conditional distribution 
of any X t depends on the previous X values drawn. In all other cases the 
Xi are independent; for simplicity, we shall assume this independence in the 
rest of the tjook (except Section 6-5). 

6-2 SAMPLE SUM 



wL 



Now we are ready to use the heavy artillery drawn up in Chapter 5. 



First consider S> the sum of the sample observations, defined as: 

S±X x + X 2 + X z +*- + X n 
The expected value of S is obtained by using 4 Theorem (5-29), as: 
E(S) = E(X X ) + E{X 2 ) + • • • + E(X n ) 



(6-2) 



(6-3) 



4 E(S) = E{X 1 + X 2 + 
by Theorem C5-29): 



+ X n ) = E[(X, + + X n _ x ) + X n ] 

= E(X 1 + X 2 4- ■ ■ • + X n ^) + E(X n ) 
j = E[(X ± + X 2 + • ■ ■ + X n _ 2 ) + X n ^] + E(X n ) 

Again by Theorem (5-29): = E(X 1 4- X 2 + • • ■ + X n 2 ) + E(X n _{) 4- £(X n ) 



- E(XJ 4- E(X 2 ) + • - • 4- E(X n ) 

This generalization of the special two-variable case in (5-29) is an example of proof by 
induction. 
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Noting from (6-1) that each X l9 X 2 , . . . X n has the same distribution as the 
population, it follows that each has the same mean as the population (ju). 
(6-3) can therefore be written: 

E(S) = fx + ju + ■ • • + p 

E(S) = tip (6-4) 

or 



p s = nix 



(6-5) 



Thus, the expected value of a sample sum is simply the mean of the parent 
population times the sample size. 

In the same way, the variance of S is obtained by using Theorem (5-33): 

var S = var (X 1 + X 2 + • • • + X n ) 

= var X 1 + var X 2 + h var X n (6-6) 

Note that this depends on the assumed independence of X l9 X 2 , . . . X n . 
Again, since all the X l9 X 2i . . . X n have the same distribution as the popula- 
tion, they also have the variance a 2 of the population. Thus (6-6) becomes: 

var S = o 2 + g 2 + • ■ ■ + a 2 

= no* (6-7) 

or 



(6-8) 



Formulas (6-5) and (6-8) are illustrated in Figure 6-1 a. As another example, 
suppose a machine produces a population of bicycle chain links with 
average length /lc = .40 inch and standard deviation a = .02 inch. A chain 
is made by joining together a random sample of 100 of these links. Its length 
S is a random variable, fluctuating from sample to sample. Its expected length 
is 

t u s = nfi = 100(.40) = 40.0 inches 

Moreover, because our sample is drawn from an infinite population, X u 
X 2 > ... ^ioo a **e independent. Therefore, we may apply (6-8) to compute 
the standard deviation of S. 

a s = ^Jno = 10(.02) = .20 inch 

The student will notice that this is an example of statistical deduction; 
characteristics of a sample (ji 8 , a s ) have been deduced from known charac- 
teristics (^, a) of the parent population. 

We pause to interpret (6-5) and (6-8) intuitively. It was no surprise that 
fi 8 was n times /u. But why should a s be only V n times c? Typically, a sample 



THE SAMPLE MEAN 

Sample sum 
(n - 4 observations) 
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FIG. 6-1 {a) Relation of the sample sum S to the parent population, (b) Relation of the 
j sample mean X to the parent population. 

sum (e.g. chain) will include some individuals (links) which are oversized, 
and some which are undersized so that some cancellation occurs. Thus while 
the spread injthe chain (a s ) does exceed the spread in an individual link (<x), 
it is substantially less than it would be if the errors in all the links were 
accumulated without cancellation (no). 

6-3 THE SAMPLE MEAN 

Recall the definition of the sample mean, 



We recognize 



X = i (X x + X 2 + 
n 



(2-la) repeated 



(6-9) 



that X is just a linear transformation of 5", and hence X can 



easily be analyzed in terms of S. 
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It is important to remember that X, as well as S, is a random variable 
that fluctuates from sample to sample. It seems intuitively clear that X will 
fluctuate about the same central value as an individual observation, but with 
less deviation because of "averaging out." We thus find plausible the formulas 



a 



(6-10) 
(6-11) 



Proof. First, for the mean, we apply the last row of Table 5-6 to (6-9). 



and from (6-5) 



1 
n 



= ~ («/*) 
n 



l l x = ft (6-10) proved 

Now, for the variance, we apply the last row of Table 5-6 to (6-9) again. 



and from (6-7) 



4 = 



a 

V n 



(6-12) 



(6-11) proved 



Formulas (6-10) and (6-11) are illustrated in Figure 6-1 b. A graph of the 
distribution of the sample mean for n = 9 and n = 25 is left as an exercise; 
this will confirm how this distribution concentrates about fx as sample size 
increases. 

We review this section by reconsidering a familiar problem — the rolling 
of a die. Two rolls (X l9 X 2 ) can be regarded as a sample of 2 taken from the 
infinite population of all possible rolls of the die. This is also equivalent to a 
sampling of 2 chips from a bowl, as discussed in Problem 5-26. The probabil- 
ity distribution of the parent population is shown in Table 6-2a ? along with 
its mean (jli) and standard deviation (a). 

Because this experiment has such simple probability characteristics, we 
can also compute the probability distribution of S and of X for a sample of 
2 rolls of the die as shown in Table 6-2b; the moments of both S and X are 
also calculated in this table. 



Table 6-2 (a) Probability Distribution of the Roll of a Die (Population) 



Table 6-2 



p(x) 



x p(x) 



1 

2 
3 
4 
5 
6 



1/6 
1/6 
1/6 
1/6 
1/6 
1/6 



1/6 
2/6 
3/6 
4/6 
5/6 
6/6 



similarly 



ju = 21/6 = 3.5 



* - i~ = 1-71 



(b) Probability Distribution of the Sample S and X, with n = 2 



Outcome £jet or 
Sample S|>ace 

First Second 
Die Die 



(2) 



(3) 



Sum Mean 
s x Probability 



(4) 



sp(s) 



(5) 



x p{x) 

■ ; - ji 



•0,1) 

.0.2): 
•(2, 1) 

■(1,3) 
• (2,2)' 
•(3, 1) 



36 equipropable 
outcomes 



.(6, 6) 



2 


1 


1/36 


3 


1.5 


2/36 


4 


2 


3/36 


5 


2.5 


4/36 


6 


3 


5/36 


7 


3.5 


6/36 


8 


4 


5/36 


9 


4.5 


4/36 


10 


5 


3/36 


11 


5.5 


2/36 


12 


6 


1/36 



2/36 
6/36 

12/36 



1/36 
3/36 

6/36 



f* s = 252/36 fi x = 126/36 

-7.0 =3.5 

similarly similarly 

a s = 2.4 eg ~ A *2 
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Table 6-2 (c) Alternative Calculation of Mean and Variance 



Direct On the Other Hand, This Relevant Formula 

Calculation Gives the Short-cut Calculation (using population 
Moment from Table 6-2b fx and a from Table 6-2a) 





7.0 


(6-5) 




— nil 


= 2(3.5) = 7 


°s 


2.4 


(6-8) 


a s 


— Vn a 


= V2(1.71) - 2.4 




3.5 


(6-10) 


t*X 


= n 


= 3.5 


a X 


1.2 


(6-11) 


a x 


a 

Vn 


1.71 
= — = 1.2 
v2 




7 ^ ////// ^ r 



777777 7X 
J I I I 



M S = 7.0 

<rs = 2.4^ 
fa) 



10 



12 




FIG. 6-2 Throwing a die twice (a specific illustration of Fig. 6-1). (a) Relation of the 
sample sum 5 to the parent population, (b) Relation of the sample mean X to the parent 
population. (Note. In order to facilitate graphing, the probabilities were converted to 
probability densities, so that they would all have the same comparable area = 1.) 
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In Table 6-2c we show how these moments could have been obtained 
more simply using the formulas of this section. Finally, this die-tossing 
example is surimarized in Figure 6-2. 

PROBLEMS 

6-1 True or jfalse? If false, correct the errors: 

When a die is rolled twice, the average of the 2 numbers (J?) is a 
random 'variable having an expectation of 3|, the same expectation as 
for a single roll X. This illustrates fi s = /u. 

The range of X is from 1 to 6, also the same as for a single roll X. 
However, X does not take on all values equally likely — the extreme 
values 4re rare. Thus X has a smaller standard deviation than X, 
illustrating g% = a/y/n. 

Incidentally, this illustrates why the range of a random variable is 
a better measure of spread than the standard deviation. 
6-2 True orfalse? If false, correct the errors: 

If 10 men were randomly sampled from the population of Table 
6-1, ana then laid end to end, the expectation of the total length would 
be n/Li =J= 678 inches. The total length would vary (from sample to 
sample) with a standard deviation of no = 53 inches. 

On ? the other hand, if the 10 men in the random sample were 
averaged, the expectation of the average would be \x = 67.8 inches, and 
its standard deviation would be a = 5.3 inches. This is how the long 
and short men in the sample tend to "average out," making X fluctuate 
less than a single observation. 
6-3 (Classroom Exercise) 

(a) Ma^e a relative frequency (probability) graph of the population of 
heights jof the men in the class. 

(b) Tal^e a few random samples of size 4 (with replacement), showing 
how in pach sample the tall students tend to be offset by short students. 

(c) For each sample, calculate X. Plot the values of X and compare 
to (a). 1 " 

6-4 The population of employees in a certain large office building has 
weights distributed around a mean of 150 pounds, with a standard 
deviation of 20 pounds. A random group of 25 employees get in the 
elevator each morning. Find the mean and variance of: 

(a) Th^ total weight S. 

(b) The average weight X. 

=> 6-5 A bowl is full of many chips, one-third marked 2, one-third marked 4, 
and one-third marked 6. 
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(a) When one chip is drawn, let Xbe its number. Find fi and a, (the 
population mean and standard deviation.) 

(b) When a sample of 2 chips is drawn, let X be the sample mean'. Find 

(1) The probability table of X. 

(2) From this calculate jjl^ and cr r ; check your answers using 
(6-10) and (6-11). 

(c) Repeat (b) for a sample of 3 chips. 

(d) Graph p(x) for each case above, i.e., for sample size n = 1, 2, 3. 
Comparison is facilitated by using probability density, i.e., by using a 
bar graph with probability — area — (height) (width). 

As n increases, notice that p(x) becomes more concentrated around 
ju. What else is happening to the shape of /?(#)? 

6-4 THE CENTRAL LIMIT THEOREM 

In the preceding section we found the mean and standard deviation of X. 
The one question we have not yet addressed is the shape of its distribution. 
We consider two cases. 



(a) The Distribution of the Sample Mean When the Population 
is Normal 

In this case X is exactly normal. This follows from a theorem on linear 
combinations, which we quote without proof: 

If X and Kare normal, then any linear combination (6 13) 

Z = aX + b Y is also a normal random variable. * " ^ 

With a normal population, each observation in a sample X u X 2 , . . . X n is 
normal. The sample mean Xcan be written as a linear combination of these 
n normal variables, 

X - - X x + - X 2 • • • + - X n (6-14) 
n n n 

so that (6-13) can be used to establish that X is normal. Finally, we re- 
emphasize that its distribution concentrates about ju as sample size n increases 
(ref. 6-11). 

(b) The Distribution of X When the Population is Not Normal 

It is surprising that, even in this case, most of the same conclusions 
follow. As an example, consider the bowl of 3 kinds of chips in Problem 6-5. 
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This is obviously a nonnormal population; in fact, it is a rectangular distribu- 
tion. As a larger and larger sample is taken, the distribution of Xis graphed 5 
in Figure 6-3a|. As well as the increasing concentration of this distribution, 
we notice the tendency to the normal bell shape. 

This same tendency to the normal shape occurs for the sample of dice 
throws (n = L 2, 3, . . . throws from a population of all possible throws), 
as shown in Fjigure 6-3b. 

Finally, in Figure 6-3c a third population is shown, having chips num- 
bered 2, 4, and 6, with proportions 1/4, 1/4, and 1/2. Sample means" from 
this populatiok also show the same tendency to normality. 

These three examples display an astonishing pattern— the sample mean 
becomes normally distributed as n grows, no matter what the parent popula- 
tion is. This pattern is of such central importance that mathematicians have 
formulated it as 



The Central Limit Theorem. As the sample size n 
increases, the distribution of the mean, X, of a 
sample taken from practically 6 any population 
approaches a normal distribution, (with mean ,u 
and standard deviation or/V/j)- 



(6-15) 



The central limit theorem is not only remarkable, but very practical as 
well. For it completely specifies the distribution of X in large samples, and is 
therefore the key to large-sample statistical inference. In fact, as a rule of 
thumb it has been found that usually when the size n reaches about 10 or 20, 
the distribution of X is practically normal. This is certainly the case in the 3 
examples of figure 6-3. 

In conclusion, we can assume that X is normal for any sample taken 
from a normal population, and for large samples taken from practically any 
population. With our previous conclusions on the mean and standard.devia- 
tion of X, we can now be very specific in our deduction about a sample mean 
taken from a known population. 



Example 



a normal dist 



Considey the marks of all students on a statistics test. If the marks have 
' 'bution with a mean of 72 and standard deviation of 9, compare 



jas already done the first 3 graphs of Figure 6-3« (in Problem 6-5), and the 
f Figure 6-3/? (in Table 6-2). The rest of the graphs may be similarly cal- 



5 The student ] 
first 2 graphs 
ciliated. 

6 The one qualification is that the population have finite variance. For a proof of this 
theorem, see, for example, P. Hoel, Introduction to Mathematical Statistics, 3rd ed., 
pp. 143-5, John Wiley & Sons, 1962. 
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P(x) 

n- 1 



n = 2 




P(x) 



p(x) 



1 III 1 11 11 1 



1 I I I 



2 *4 6 x 



2 4* 6 x 



P(x) 



P(x) 



n = 3 



P(x) 



n = 5 




n= 10 





^11. In 




FIG. 6-3 The limiting normal shape for p(x). {a) Bowl of three kinds of chips, (b) Bowl of 
six kinds of chips (or die), (c) Bowl of three kinds of chips of different frequency. 



(1) the probability that any one student will have a mark over 78 with (2) 
the probability that a sample of 10 students will have an average mark over 78. 

1. The probability that a single student will have a mark over 78 is 
found by standardizing the normal population 

Pr,*>78) = P??^£>?i^ t 

V r ' 

= Pr (Z > .67) = 50 - .2486 = .2514. 
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2. Now consider 
above we know 

a\4n = 9/VTo 
exceeding 78 to 



student will ge 



the distribution of the sample mean. From the theorems 
it is normal, with a mean of 72 and a standard deviation 
From this we calculate the probability of a sample mean 
be: 

Pr (X > 78) - Pr > ^-=3 

\ 9/vio ; 

= Pr(Z > 2.11) 

= .0174 (646) 




100 
Pr (X > 78) 
= .017 

FIG. 6-4 Comparison of probabilities for the population and for the sample mean. 
Hence, although there is a reasonable chance (about 1/4) that a single 



over 78, there is very little chance (about 1/60) that a sample 



average of ten: students will perform this well. This is shown in Figure 6-4. 



PROBLEMS 

6-6 The weights of packages filled by a machine are normally distributed 
about a mean of 25 ounces, with a standard deviation of one ounce. 
What is the probability that n packages from the machine will have 
an average weight of less than 24 ounces if « = 1, 4, 16, 64? 

6-7 Suppose that the education level among adults in a certain country 
has a n ean of 1 1 . 1 years, and a variance of 9. What is the probability 
that in a random survey of 100 adults you will find an average level 
of schooling between 10 and 12? 

6-8 Does tie central limit theorem (6-15) also hold true for the sample 
sum? Justify briefly. 
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6-9 An elevator is designed with a load limit of 2000 lb. It claims a capacity 
of 10 persons. If the weights of all the people using the elevator are 
normally distributed with a mean of 185 lb and a standard deviation 
of 22 lb, what is the probability that a group of 10 persons will exceed 
the load limit of the elevator? 
6-10 Suppose that bicycle chain links have lengths distributed around a 
mean fx = .50 cm, with a standard deviation a — .04 cm. The manu- 
facturer's standards require the chain to be between 49 and 50 cm long. 

(a) If chains are made of 100 links, what proportion of them meets 
the standards? 

(b) If chains are made of only 99 links, what proportion now meets 
the standards? How many links should be put in a chain? 

(c) Using 99 links, to what value must a be reduced (how much must 
the quality control on the links be improved) in order to have 90 
percent of the chains meet the standards? 

(6-11) The amount of pocket money that persons in a certain city carry has 
a nonnormal distribution with a mean of S9.00 and a standard devia- 
tion of $2.50. What is the probability that a group of 225 individuals 
will be carrying a total of more than $21 00? 
6-12 In Problems 6-6 to 6-11, the variance formulas required that the n 
individuals in the sample were independently drawn. Do you think 
this is a questionable assumption? Why? 

*6-13 A farmer has 9 wheatfields planted. The distribution of yield from 
each field has a mean of 1000 bushels and variance 20,000. Further- 
more, the yields of any 2 fields are correlated, because they share the 
same weather conditions, weed control, etc; in fact the covariance is 
10,000. Letting S denote the total yield from all 9 fields, find 

(a) The mean and variance of S. [Hint. How must the footnote proof 
to (5-32) be adjusted?] 

(b) Pr (S < 8,000), assuming S is normal. 



*6-5 SAMPLING FROM A FINITE POPULATION, 
WITHOUT REPLACEMENT 

In the preceding analysis, we have assumed either sampling with replace- 
ment, or alternatively, sampling from an infinite population — in which case 
it doesn't matter whether we replace or not. This leaves one remaining 
possibility — sampling from a finite population, without replacement. 

* This is a starred section, and like a starred problem, it is optional; the student may 
skip it without loss of continuity. 
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We have already argued in Section 6-1 that all the X t in a sample of n 



observations 



whether or hot we replace; i.e., equation (6-1) holds regardless, so that 



(6-5) still fol 



(X l9 X 2 , . . . X n ) will have the same (marginal) distribution 



ows from (6-3): 



And similarly 



f*8 = W 
11$ = fX 



(6-5) repeated 
(6-10) repeated 



On the other jhand, the variance of the sample mean does depend on whether 
or not we replace; it is easy to see why. Suppose we sample 10 of the heights 
of the male ^tudents on a college campus; suppose further that the first 
student we sample is the star of the basketball team (say Lew Alcindor, at 
7 feet 1 inch). Clearly, we now face the problem of a sample average that is 
"off target "-f specifically, too high. If we replace, then in the next 9 men 
chosen, Alcirjdor could turn up again, throwing our sample mean even 
further off target on the high side. But if we don't replace, then we don't have 
to worry aboiit Alcindor again. In summary, sampling without replacement 
yields a more reliable sample mean (i.e., Zhas less variance), because extreme 
values once sampled, cannot return to haunt us again. 

Formally, the argument runs as follows. If we sample without replace- 
ment, then X x \ X 2 . . . X n are not independent. Hence all our theorems on the 
variance of S j^nd X above, based on the independence assumption, dp not 
hold true. Specifically, (6-7) — which assumed replacement — now must be 
modified to: 



var S = ^| = no 2 



(sampling without 
replacement) 



'N - rt 
N - 1 



where N = population size, and n = sample size. Furthermore 
which also assumed replacement — must be similarly modified to: 



var 



(sampling without 
replacement) 



(6-17) 



(6-12)- 



(6-18) 



Although we do not prove these two formulas, we interpret them: 

1 . The variance of X without replacement (6-1 8) is less than the variance 
of X with replacement (6-12); (this is the formal confirmation of our 
intuitive example of heights of college students),, This occurs because 'the 
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'reduction factor/' 




(6-19) 



appearing in (6-18) is less than one. [Unless of course, the sample size n 
is only one. In this case, no distinction can be made between replacement 
and nonreplacement, and (6-12) and (6-18) must necessarily coincide. 
If you have wondered where the 1 came from in the denominator, you can 
see that it is necessary, in order to logically make (6-12) and (6-18) equivalent, 
as they must be, for a sample size of one.] 

2. When n = N, the sample coincides with the whole population, every 
time. Hence every sample mean must be the same — the population mean. 
The variance of the sample mean, being a measure of its spread, must be zero. 
This is reflected in (6-19) having a zero numerator; and var X in (6-18) 
becomes zero. (Note that with replacement this is not the case — in this 
instance, n — N does not guarantee that the sample and the population are 
identical). 

3. On the other hand, when n is much smaller than N 9 (e.g., when 200 
men are sampled from 80 million), then (6-19) is practically 1, so that var X 
is practically the same as with replacement. This, of course, coincides with 
common sense; if the population is very large, it makes very little difference 
whether or not the observations are thrown back in again before continuing 
sampling. 



PROBLEMS 

*6-14 In the game of bridge, cards are allotted points as follows: 



(a) For the population of 52 cards, find the mean number of points, 
and the variance. 

(b) In a randomly dealt hand of 13 cards, the number of points Fis a 
random variable. What are the mean and variance of F? (Bridge 
players beware: no points counted for distribution). 



Cards 



Points 



All cards below jack 



0 
1 
2 
3 
4 



Jack 
Queen 

King 



Ace 
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(c) WhcjitisPr (Y > 13)? (Hint. The distribution shape is approximately 
normal,; as we might hope from the central limit theorem). 
*6-15 Rework Problem 6-9, assuming the population of people using the 
elevator is no longer very large, but rather 

(a) N =f 500. 

(b) N =f 50. 



6-6 SAMPLING FROM BERNOULLI POPULATIONS 

We have! examined the distribution of a sample mean and a sample sum; 
the final statistic that we study is the one referred to in our poll of U.S. voters 
in Chapter 1 the sample proportion P. 



(a) The Bernoulli Population 

First, we must be clear on the population from which the sample is 
drawn . We conceive of this as being made up of a large number of individuals, 



Table 6-3 A Bernoulli Variable 



Frequency 



xpix) 



Republican 0 
Democrat 1 



66,000,000 
84,000,000 



66,000,000 
150,000,000 
84,000,000 
150,000,000 



= .44 
= .56 



.56 



^ = 150,000,000 



M - .56 



all marked E> or R (Democrat or Republican). We can make this look like 
the familiar powl of chips by relabelling each D with a 1 and each R vyith a 0. 
Thus, if the voting population of 150 million is comprised of 84 million 
Democrats and 66 million Republicans, the population probability distribu- 
tion would be as shown in Table 6-3. ^ 

The population proportion tt of Democrats is .56, which is also the 
in sampling one individual at random, that a Democrat will be 
is called a "Bernoulli" population and its distribution is graphed 



probability, 
chosen. This 



later in Figure 6~6a. This is the simplest kind of probability distribution, 
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lumped at only two values — 0 and 1 . (Note that this population is as far from 
being normal as any that we will encounter). Its mean and variance are easily 
computed in Table 6-4. In our example, fx = .56, and a — .5 

The reason that the arbitrary values of 0 and 1 were assigned to the 
population is now clear. This ensures that fi and tt coincide. 



Table 6-4 Calculation of ju and <x 2 for a Bernoulli Population 



X 


p(x) 


x p(x) 


(x - n) (x - fifpix) 


0 


(1 -tt) 


0 


— TT ( — 7v) 2 (l — TT) 


1 


TT 


TT 


1 - TT (1 - TT) 2 TT 



fx = tt (6-20) 

a 2 = 77(1 - 77) (6-21) 

a = Vtt(1 - rr) (6-22) 

(b) Bernoulli Sampling 

We now ask, "What can we expect of a sample drawn from this sort of 
population?" The population is so large that even without replacement, the 
n observations are practically independent; the probability of choosing a 
Democrat remains practically .56 regardless of whether or not we replace. 

If we take a sample of n = 50 let us say, we might obtain, for example, 
the following 50 numbers ; 

011010010111. ..Oil (6-23) 

The sample sum, of course, will be just the number of Democrats in the 
sample. We recall encountering this before as a binomial random variable in 
Table 4-3; thus a binomial random variable is simply a sample sum in 
disguise. 

Why is this interesting coincidence of any practical value? Suppose we 
wish to calculate the binomial probability of at least 30 Democrats in 50 
trials. We could evaluate the probability of exactly 30 Democrats, of 31, 32, 
and so on. This would require a major computational effort: not only are 
some twenty odd probabilities involved, but in addition, each is extremely 
difficult to calculate. 7 But we recognize that this is equivalent to calculating 

7 As an exercise, the student should consider whether it is feasible to evaluate the prob- 
ability of getting 30 Democrats in a sample of 50, which is: 

(^(.56)30(.44r 
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the probability that S> the sample sum taken from a Bernoulli population, 
is at least 30 in a sample of 50. This is very easy to calculate, because in the 
previous section we have completely described the distribution of any sample 
sum. 

Sin fact is approximately normally distributed, 8 with the following mean 
and variance:: 
From (6-j5), 

from (6-20), 



From (6-7), 
and using (6- 



Binomial mean 



(6-24) 



1), 



<j| = fl7T(\ — 7T) 



| a s = V «tt(1 — tt) 



Binomial standard deviation 



(6-25) 



.28 



Hence the probability of at least 30 Democrats in a sample of 50 is: 

Pr (S ^ 30) 

which, in standardized form is 

Pr > — \c=L Pr (Z > .58) 

\ a 8 - V 12 ' 3 / ~ 

To confirm the usefulness of this normal approximation to the binomial, the 
student should compare this simple solution with the calculations involved 
in evaluating some twenty-odd expressions, each like the one in the footnote 
on p. 120. ^he normal approximation to the binomial is graphed 9 in 
Figure 6-5. 



(6-26) 



8 For large n, by the central limit theorem. A useful rule of thumb is that n should be large 
enough to make mr > 5 and n(\ — 77) > 5. [f n is large, yet tt is so small that htt < 5, 
then there is a better approximation than the normal, called the Poisson distribution. 

9 This graph clearly indicates that a better approximation to the binomial histogram would 
be the area under the curve above 29.5, not 30. This peculiarity arises from trying to ap- 
proximate a discrete variable with a continuous one, and is therefore called the continuity 
correction. Our 



better approximation is 



To keep the an 
book. 



Pr 



/ S - fig > 29.5 - 28 \ 

\ <* S ~* VTT3 / 



'. Pr (Z > .43) ~ .334 



(6-27) 



ilysis uncluttered, this continuity correction is ignored in the rest of the 
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With continuity correction Pr = .334 



Without continuity correction Pr = .28 




Compare with exact 
answer Pr » .337 
(shaded area) 



29.5- 

s = number of Democrats in sample 
FIG. 6-5 Normal approximation to the binomial. (Compare Fig, 6-la.) 

We now turn to the second major issue of this section: with 77 known, 
what is the distribution of the sample proportion PI 

Just as the total number of successes is merely the sample sum in 
disguise, so the sample proportion is merely the sample mean in disguise: 



p = ^ = ;r 

n 



(6-28) 



All our theory developed for X, can now be applied to determine the distribu- 
tion of the sample statistic P. Thus, from (6-10) and (6-20) the mean of Pis: 



flp = 77 



(6-29) 



From this we note that, on the average, the sample proportion P is on target, 
i.e., its average value is equal to the population proportion — which (we shall 
see in Chapter 8) it will be used to estimate. But any specific sample P will 
be subject to sampling variation and will typically fall above or below n. 
From (6-11) and (6-22) we discover that its standard deviation is 



a P 



7T(1 - 77) 



(6-30) 



Finally, since P is a sample mean, its distribution is normal for large samples 
(central limit theorem). 

As an example, consider the population of voters shown in Figure 6-6a. 
What is the probability that in a random sample of 50 voters between 50 
and 60 percent will be Democrats? From (6-29) and (6-30) 



ftp = 77 = .56 



77(1 — 77) 
n 



.56(1 - .56) _ 



50 



- .070 
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These two 
define the 
tion was 
normal. 



values, along with our knowledge that P is normal, completely 
distribution of P shown in Figure 6-66. Even though our popula- 
nowhere near normal, our sample statistic P is approximately 



P(x) 

,00 



50 



± 



1.0 

ji — w = .56 

a = - 7r) = .50 

(a) 

/Pr (.50 < F< .60) = .5208 




Pr (.50 



PROBLEMS 



.4 A .6 

fjip= ir - .56 
(b) 

FIG. 6-6 Relation of the sample proportion to the population proportion (compare 
Fig. 6-16). (a) Population of voters, (b) In a sample of 50 voters, distribution of P. 

(j. 

The evaluation of the area of this normal distribution between .5t> and 
.60 is now a s xaightforward matter: 



.60) = Pr {■ 



50 - .56 P - p P .60 - .56 \ 
~ a P ~ .070 J 



.070 

= Pr(-.857 <Z<.572) 
= .5208 



(Note that if you want high accuracy in your answers, you should make 
continuity corrections.) 
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6-16 Suppose Gallup takes a poll of 1000 voters from a population which 
is 56 percent Democratic. Letting P be the sample proportion, find 

(a) Pr (.52 < P < .60). 

(b) Pr (P > ,5), i.e., the probability that the sample will correctly 
predict the election. Note how we are beginning to answer the 
problems raised in Chapter I. 

6-1 7 In tossing a fair coin 50 times, what is the probability that the propor- 
tion of heads will exceed .55? 

6-18 If a fair die is rolled 100 times, what is the probability that at least one 
quarter of these are aces? Answer two ways: 

(a) Pr (/> > i) 

(b) Pr (total number of aces > 25) 

(6-19) What is the chance that of the first 100 babies born in the New Year, 
more than 60 will be boys? 
6-20 What is the chance that of the first 8 babies born in the New Year, 
more than 6 will be boys? Answer two ways: 

(a) Exactly, using the binomial distribution. 

(b) Approximately, using the normal distribution. 

6-7 SUMMARY OF SAMPLING THEORY 

(a) General Sampling 

1. The distribution of the sample mean X is approximately normal for 
large samples — say n > 10 or 20 as a rule of thumb. (Moreover, if the 
population is near normal, then a much smaller sample will be approximately 
normal.) 

2. X will have an expectation equal to ft, the population expectation. 

3. If we sample without replacement, X will have a variance equal to: 

a 2 V N - nl 
n IN - Jj 

If the population (AO is very large, this reduces to, approximately: 

n 

which is also the formula for variance when we sample with replacement. 
Thus we may write: 




(6-31) 
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which is a useful abbreviation for "X is normally distributed, with mean ft 
and variance <f-\nr 



(b) Bernoull Sampling 

If we apply this sampling theory to a special population — chips coded 
0 and 1— then we have the solution to the proportion problem. The sample 
proportion P is just a disguised X, and the population proportion tt is just 
a disguised ft, so that 



again assumi ig n is sufficiently large. 



Review Prob 



• N 



(6-32) 



ems 



6-21 Five njien, selected at random from a normal population with mean 
weight ju = 1601b and a = 201b, get on an elevator. What is the 
probability that 

(a) All five men weigh more than 170? 

(b) Thie average weight is more than 170? 

(c) Thje total weight is more than 850? 

(d) Give an intuitive reason why your answers are related, 

6-22 A man at a carnival pays SI to play a game (roulette) with the 
following payoff: 



Y = 



Gross Winning Net Winning =7—1 Probability 



0 
$2 



-1 
+ 1 



20/38 
18/38 



(a) What is the average net winning in a game ? 

(b) Whjat is his approximate chance of ending up a loser (net loss) 
if he plays the game: 

(1) 15 times? 

(2) ^25 times? 

(3) *125 times? 

(c) Ho\jv could you get an exact answer for (b)l ? 

(d) How many times should he play is he wants to be 99% certain of 
losing*: 
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6-23 Fill in the blanks. 

(a) Suppose that in a certain election, the U.S. and California are 
alike in their proportion of Democrats, 7r, the only difference being 
that the U.S. is about 10 times as large a population. In order to get 

an equally reliable estimate of 7r, the U.S. sample should be 

as large as the California sample. 

(b) A certain length is measured with an error, which we suppose 
for simplicity to be +2" or —2", equally likely. A sample of n inde- 
pendent measurements is taken. The sample sum S could possibly be in 
error as much as — — . However, 5* is likely (95%) to be in error by 
no more than . For example, for n = 100, these two errors are 

1. Worst possible error = 

2. Likely error < 

=> 6-24 Let X be the sample mean when a die is thrown 1000 times. Intuitively 
we feel "fairly certain" that X is "quite close" to fi. More precisely, 
calculate 

PrGu - .1< X<ju, + .1) 

(6-25) In making up a budget, a housewife rounds out to the nearest 10<h 

(a) If the budget consists of 200 items, what is the chance that the 
rounding error will exceed Si. 00? 

(b) Briefly state the assumptions necessary in answer (a). 

(6-26) Suppose there are five men in a room, whose heights in inches are 
62, 65, 68, 65, 65. One man is drawn at random with his height 
denoted X. 

(a) Graph the probability function of X, i.e., the population distribu- 
tion. Find its mean //, and variance a 2 . 

Suppose a sample of two men is drawn, with replacement, and 
the sample mean X is calculated. 

(b) Construct a table of the probability function of X. {Hint. List the 
possible samples, i.e., the sample space. Are the outcomes equally 
likely? For each outcome, calculate X.) 

(c) Graph the probability function of X. 

(d) Find the mean and variance of Xfrom its probability distribution. 

(e) Check your answers to (d) using the equations of this chapter. 

(f) Is the following a valid interpretation of these formulas? If not, 
correct it: 

X fluctuates around sometimes larger, sometimes smaller, but 
exactly equal to fi on the average, (fix = ft). X, the average of n 
observations, does not fluctuate as much as a single observation, 
however (a\ = a 2 \n < a 2 ). This is to be expected, because in a 
sample, a large observation will often be "cancelled out" by a 
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.mall observation, or at least swamped by the rest of the observa- 
jions which will be more typical. 
(6-27) Repeat Problem 6-26 for a sample of 2 men drawn „v7W replace- 
mentj Why „ this sampling without replacement preferable? " 

Pr (i - .01 < /> < i + .oi) 



chapter 7 



Estimation I 



7-1 INTRODUCTION 



Before beginning statistical induction, we pause in Table 7-1 to review 
the concepts of sample and population. 

It is essential to remember that the population is fixed, so that its mean 
s u and variance a 2 are constants (though generally unknown). These are 
called population parameters. 

By contrast, the sample mean X and sample variance s 2 are random 
variables, varying from sample to sample, with a certain probability distribu- 
tion. For example, the distribution of X was found to be approximately 
X N (//, a 2 fn) in Chapter 6. A random variable such as X or s 2 which is 
calculated from the observations in a sample is given the technical name 
sample statistic. 

As a specific example of statistical inference, suppose we wish to estimate 
the average height of American men on a large Midwestern campus. This 
population mean /.i is a fixed, but unknown parameter. We estimate it by 
taking a sample of 36 students, and compute the sample mean X\ let us 
suppose this turns out to be 68 inches. We shall see in the next section that this 
is our best single estimate or "point estimate" of /u. But we also know, from 



Table 7-1 Review of Sample versus Population 



Random Sample is a Random Subset of the Population 



1. Relative frequencies///? 



Probabilities p(x) 
are used to compute 



are used to compute 
2. X and s % 



/.( and (j 2 



which are examples of 

3. Random statistics, or 

4. Estimators 



which are examples of 
Fixed parameters, or 
Targets 
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the theory of our previous chapter that unless we are extremely lucky in our 
sample this estimate X will not be exactly on target, but rather a bit high or 
a bit low. Technically, X is distributed around above and below it- — as 
shown in Figure 6-16. If we want to be reasonably confident that our inference 
is correct, we cannot estimate fi to be precisely equal to our observed X; 
instead we must estimate that fi is bracketed by some interval -known as a 
confidence interval — of the following form. 

fi = X ± an error allowance (7-1) 
As an example, we might estimate 

p = 68 ± 3 inches (7-2) 
We observe that in evaluating the right-hand side of (7-1), there is no problem 
with the estimator X; this is a simple calculation of the average of the sample 
values. The problem is the evaluation of the error allowance. 

In this secjtion we will show that we can be very specific in our interval 
estimate of fi; because we could be specific about the distribution of X 
around jli in the previous chapter. To keep inferences simple, we assume X 
is normally distributed according to the assumptions of Section 6-3, so that 
its distribution is that of Figure 7-1. 

First we must decide: "How confident do we wish to be that our interval 
estimate is rignt — that it does bracket It is common to choose 95% 
confidence; in bther words, we will be using a technique that will give lis, in 
the long run, a correct interval estimate 19 times out of 20. 

To get a confidence level of 95 %, we select the smallest range under the 
normal distribution of X that will just enclose a 95 % probability. Obviously, 
this is the middle chunk, leaving 2\% probability excluded in each tail, as 
shown in Figure 7-1. From our normal tables, we note that this involves 
going above and below the mean by 1.96 standard deviations of X\ We 
therefore write 



pjr L - \M-^< X <fx + 1.96 4=] ^ 95 % 



(7-3) 



The bracketed inequalities may be solved for /u, "turned around' 1 so to 
speak, obtaining the equivalent statement: 1 



X - 1 .96 — = < n < X + 1 .96 



V") 



95% 



(7-4) 



To prove (7-4) more directly, we could begin by standardizing X, which then has the 



standard normal c 



istribution. Thus from the standard normal tables; 



In (7-5) the bracke 



Pr |- 



>6 < ^— < 1,96) == 95% 



... 



(7-5) 



ed inequalities may be solved for /<, obtaining the equivalent inequalities 



Pr |; 



1.96— < (i < X + 1.96 



Vn) 



95°/ 



/o 



(7-6) 

(7-4) proved 



P(x) 




M — 1.96 



jLt+ 1.96-^= 



M 

which is 
also }Ax 

FIG. 7-1 Distribution of sample mean X ~ N [ju, (v 2 ln)]. (Note, ju is an unknown constant; 
we don't know what its value is; all we know is that, whatever /< may be, the variable Xis 
distributed around it as shown in this diagram.) 



*2 



*12 



Distribution of sample mean 
X — N(ix, <r 2 /n) 




70 



*3 



This is what 
we know, 
} but the 
statistician 
does not. 



His first interval 
estimate, 



His second, 



His third, 

and so on; 
(so far, all 
bracket ju) 



His one miss 



His twentieth 



These are the 
statistician's 
interval 
estimates. 



*20 J 

FIG. 7-2 Construction of twenty interval estimates: a typical result. 
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We must be exceedingly careful not to misinterpret (7-4). jli has not 
changed its character in the course of this algebraic manipulation. Tt has not 
become a variable; it remains a population constant. Equation (7-4), like 
(7-3), is a probability statement about the random variable X, or more pre- 
cisely, the "random interval" X - \M(aj\Tn) to X + 1.96(<r/V*)- It w this 
interval that varies, not jli. 

To appreciate this fundamental point, let's return to our problem of 
constructing 'an interval estimate of average men's heights on our large 
campus. Moreover, to clearly illustrate what is going on, suppose we have 
some supernatural knowledge of the population ju (which we know to be 
69 inches) and a (which we know to be 6 inches). Now let's just observe what 
happens when the statistician (poor mortal that he is) tries to estimate ju 
using (7-4) abjOve. Just for the sake of illustration, let's suppose he makes 20 
such interval .estimates, each time from a different random sample of 36. 
Figure 7-2 illustrates his typical experience. 

In that diagram we show the distribution of the sample mean. We know 
that X is noma! with mean equal to the population ju (69) and standard 
deviation equal to aj\in = 6/\/36 — 1 inch. Thus from (7-3) we know 
that there is s 95% probability that any X will fall in the range 67 to 71 
inches. 

But the statistician doesn't know this; he blindly takes his first random 
sample, from jwhich he computes the first mean x x to be 70. From (7-4) he 
calculates 2 the appropriate 95% confidence interval for jli: 



70 ± 1 .96 - 



V36 
= 68 to 72 



(7-7) 
(7-8) 



This interval estimate for fi is the first one shown in Figure 7-2. We note 
that in his first effort, the statistician is right; ^ is enclosed in this interval. 

In his seconjd sample, the statistician happens to draw a shorter group 
of individuals, and duly computes x 2 to be 68 inches. From a similar evalua- 
tion of (7-4) he comes up with his second interval estimate shown in the 
diagram, and so on. We observe that nineteen of these twenty estimates 
bracket the constant ju. Only one — the twelfth — does not; in this case he 
missed the mark, and was wrong. 



2 We gloss over cne difficulty here. In evaluating (7-4) the statistician has an observed 
value for X and knows that sample size n is 36. But there is one value he does not know: 
a, the population standard deviation. All he can do is guess at it, and his best guess is, 
the sample standard deviation, s, which we suppose he computes to be 6 inches. We deal 



at length with this 



able approximation for a in this problem. 



problem later; but for now we can rest assured that s will be a reason- 
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We can easily see why he was right most of the time. For each interval 
estimate he is simply adding and subtracting 2 inches to his sample mean; 
but this is the same ±2 inches that defines the range ab around fi. Thus, if 
and only if he observes a sample mean within the range ab, will his interval 
estimate bracket p. Nineteen of his twenty sample means do fall in the range 
ab, and in all these instances his interval estimate was right. He was wrong 
only in the one instance when he observed a sample mean outside ab (i.e., 
.r 12 ). 

In practice, of course, a statistician does not take many samples — he 
would only take one (e.g., xj. And once this interval estimate is made, he 
is either right or wrong; this interval brackets ju or it does not. But the 
important point to recognize is that the statistician is using a method with a 
95% probability of success; this follows because there is a 95% probability 
that his observed X will fall within the range ab, and as a consequence his 
interval estimate will bracket jn. This is what is meant by a 95% confidence 
interval: the statistician knows that in the long run, 95% of the intervals he 
constructs in this way will bracket ju. 

To review, we briefly emphasize the main points: 

1. The population parameter is constant, and remains constant. It is the 
interval estimate that is a random variable, because A" is a random variable. 
As long as A 7 is a random variable that can take on a whole range of values, it 
is referred to as an "estimator" of ja. 

2. But once the sample has been observed and X takes on one specific 
value (e.g., x\ = 70 inches) it is then called an "estimate^ 3 of ju. Since it is 
no longer a random variable, probability statements are no longer strictly 
valid. For this reason when the estimate x is substituted into (7-4), it is no 
longer called a 95% probability statement, but rather a 95% confidence 
interval: 



x - 1.96 < ,u < x + 1.96 -A: 



(7-9) 



Thus, our deduction in (7-4) that X is within 1.96a/v« of p is "turned around." 
into the induction that ju is within \.96oj\ln of the observed x, (7-9) is some- 
times abbreviated to 



95% confidence interval: 

a 



(7-10) 



3 For emphasis, the estimate is denoted by the lower case letter x t while the random es- 
timator is denoted by the capital letter X. We might call x the realized value, and X the 
potential value. 
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where z 025 is) the critical value leaving 2-| % probability in the upper tail of 
the standard' normal distribution. 

To recapitulate, once A' is observed to be x y then the "die is cast," and the 
interval estinate (7-9) will be either right for certain, or wrong for certain. 

3. Because of our omniscience, we know that the statistician erred only 
in his twelftlj try. But he has no idea which estimates, if any, are wrong. All 
he knows is that he will be right 95% of the time, in the long run. 

4. As sample size is increased, the distribution of X becomes more 
concentratec around \i {oj\jn decreases as n increases), and the confidence 
interval narrows (becomes more precise). 

5. If we| wish to be more confident — e.g., 99% confident — of our con- 
clusions, then we must leave less of the probability in each tail in Figure 7-2; 
thus the ranj*e ab increases. Hence our interval estimate becomes less precise. 
Note how this point and the one preceding verify our casual observations in 
Chapter 1. I 

6. An inference about the population parameter was feasible because 
we knew the distribution of its estimator X. This raises an interesting question, 
"It is not possible that there are other statistics (for example, the' sample 
median) tha : could be used to estimate ^? Why did we use the sample mean ?" 
Intuitively, It seems preferable to estimate a mean with a mean. But there are 
stronger reasons, given in the next section. 



PROBLEMS 

I : ; 
7-1 An anthropologist measured the heights (in inches) of a random sample 
of 100 men from a certain population, and found the sample mean 
and variance to be 71 and 9 respectively. 

(a) Fmd a 95 % confidence interval for the mean height \i of the whole 
population, 

(b) Find a 99 % confidence interval. 

7-2 A research study examines the consumption expenditures (in thousands 
of dollars) of a random sample of 50 American families (all at the same 
inconje and asset level). The sample mean is 5.2 and the standard 
deviation is .72. Construct a 95% confidence interval for the mean 
consumption of all American families (at this income and asset level). 

7-3 The reaction times of 150 randomly selected drivers were found to have 
a mean of .83 sec and standard deviation of .20 sec. Find a 95% 
confidence interval for the mean reaction time of the whole population 
of drivers. 
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(7-4) From a very large class in statistics, the following 40 marks were 
randomly selected: 

71 74 65 72 64 42 62 62 58 82 

49 83 58 65 68 60 76 86 74 53 

78 64 55 87 56 50 71 58 57 75 

58 86 64 56 45 73 54 86 70 73 

Construct a 95 % confidence interval for the average mark of the 
whole class. {Hint, Reduce your work to manageable proportions by 
grouping into cells of width 5.) 
7-5 What is the probability that a statistician who constructs 20 independent 
95% confidence intervals will err: 

(a) Once (as in our example in Section 7-1)? 

(b) Not at all? 

(c) More than once? 

7-2 DESIRABLE PROPERTIES OF ESTIMATORS 

To be perfectly general, we consider any population parameter 0, and 
denote an estimator for it by $. (In our special example in the preceding 
section, /u is the population parameter 6, and X is its estimator 6). We would 
like the random variable 6 to vary within only a narrow range around its 
fixed target 6\ (thus in our example in Figure 7-2, we should like the distribu- 
tion of Xio be concentrated around ju, as close to /u as possible). We develop 
this notion of closeness in several ways. 

(a) No Bias 

An unbiased estimator is one that is, on the average, right on target, as 
shown in Figure 7-3a. Formally, we state 

Definition, 



Of course, an estimator 0 is called biased if E0) is different from 0; in 
fact, bias is defined as this difference: 

Definition. 



3 is an unbiased estimator of 8 if E(6) = 0 



(7-11) 



For example, X is an unbiased estimator of ju, because 

E(X) = p 



(6-10) repeated 



bias B = E(S) - 6 



(7-12) 



PROPERTIES OF ESTIMATORS 



135 




True 0 

= E(d) 

(a) 




True $ 



(b) 



FIG. 7-3 Comparison of a biased and unbiased estimator, (a) Unbiased estimator. 

(b) Biased estimator. 

Bias is illustrated in Figure 7-3b. The distribution of 0 is "off target"; since 
E0) exceeds 0, there will be a tendency for $ to over-estimate 0. 

As an example of a biased estimator, the sample mean squared deviation 

1 



MSD - Xf 

n 



will on the 
inflate it just 
variance 



(7-13) 
(2-5a) repeated 

iverage underestimate a 2 , the population variance. 4 Biit if we 
a little, by dividing by n — 1 instead of /?, we obtain the sample 



i 



n - 1 



X(**- Xf (7-14) 

• . (2-6) repeated 

which has been proved an unbiased estimator of a 2 . (When we say "has been 
proved," wejmean that it has been proved in advanced texts. If it has been 
proved in this text, we shall usually say "we have proved.") The student who 

4 This underestimation can be seen very easily in the case of n ~ 1. Then ^coincides with 
X h so that Eq.!(7-I3) gives MSD = 0, which is an obvious underestimate of o 2 . 

On the ot^er hand, Eq. (7-14), when n — 1, gives s 2 = 0/0, which is undefined. 
But this is not a drawback; in fact, it is a good way to warn the unwary that since a sample 
of just one observation has no "spread," it cannot estimate the population variance <r 2 
(assuming // is unknown, of course). 
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True 6 



P(0) 



(a) 




True e 



(b) 

FIG. 7-4 A comparison of an efficient and inefficient estimator (both are unbiased). 

(a) Efficient, (b) Inefficient. 

was puzzled by our division by n — 1 in defining s 2 in Chapter 2 can now see 
why: we want to use this sample variance as an unbiased estimator of the 
population variance. 

Both the sample mean and median are unbiased estimators of fx in a 
normal population; thus, in judging which is to be preferred, we must 
examine their other characteristics. 

(b) Efficiency 

As well as being on target on the average, we should also like the distribu- 
tion of an estimator 0 to be concentrated, that is, to have a small variance. 
This is the notion of efficiency, shown in Figure 7-4. We describe 6 as more 
efficient because it has smaller variance. A useful relative measure of the 
efficiency of two unbiased 5 estimators is: 

Definition* 



Relative efficiency of 8 compared to 0 = 



var 6 



(7-15) 



var S 



5 For biased estimators, the definition of efficiency is 

E(S - 6) 2 
E(6 - O f 

which of course is (7-15) if both estimators have 0 bias. 
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= 7r/2~ 1.5 



(7-17) 



An estimator which is more efficient than any other is called absolutely 
efficient, or sitnply "efficient." 

Finally, we are in a position to pass judgement on the merits of the 
sample mean'and median as estimators of /u. In sampling from a normal 
population, X has been proved to be the efficient estimator of ju. We have 
already established that its variance is a 2 jn. On the other hand, the sample 
median has been shown to have, for large n, a variance of 

(*r/2)(ff 2 /«) (7-16) 
Hence in a large sample, the relative efficiency of the sample mean compared 
to the median is derived from (7-15) as: 

(W2)(g 2 /») = 

Because it is half again more efficient, X is preferred. It will give us a point 
estimate that will tend to be closer to the target or, it will give us a more 
precise (i.e., smaller range) interval estimate. Of course, by increasing sample 
size (n) we can reduce the variance of either estimator. Therefore, an alterna- 
tive way of looking at the greater efficiency of the sample mean is to recognize 
that the sample median will yield as accurate a point or interval estimate only 
if we take a larger sample. Hence, using the sample mean is more efficient, 
because it cost^ less to sample; note how the economic and statistical defini- 
tions of efficiency coincide. 

(c) Consistency 

Roughly speaking, a consistent estimator is one that concentrates com- 
pletely on its ;arget as sample size increases indefinitely, as sketched in 
Figure 7-5. In :he limiting case, as the sample size becomes infinite, a con- 
sistent estimator d will provide a perfect point estimate of the target 6. 

We now spate consistency more precisely. Just as the variance was a 
good measure &f the spread of a distribution about its mean, so the 

j mean squared error = E0 — 6) 2 (7-18) 

is a good measure of how the distribution of 6 is spread about its target value 
6. Consistency requires this to be zero in the limit: 

Definition. 



6 is consistent 6 if E(8 - 0) 2 0 

as n -+ co 



(7-19) 



6 This definition is sometimes called "consistency in mean-square." It implies a condition 
called "consistency in probability'': for any positive 6 (no matter how small), 

Pr (\0 - 0\ < d) -> I (7-20) 

as n 00 

This is often taken as the definition of consistency. 
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P(0) 



100 



n = 50 




True 6 



FIG. 7-5 A consistent estimator, showing how the distribution of 6 concentrates on its 

target 6 as n increases. 

Mean squared error is related to bias and variance by the following 
theorem. 7 



Theorem. 



Corollary. 



0 is a consistent estimator iff 8 its variance 
and bias both approach zero, as n -> oo. 



(7-21) 



(7-22) 



If only the bias approaches zero, the estimator is called "asymptotically 
unbiased" — a condition that is clearly weaker than 9 consistency. 

Consistency does not guarantee that' an estimator is a good one. For 
example, as an estimator of ^ in a normal population, the sample median is 

7 Proof. E(0 - Of = E[(B - + (ft$ - 0)f 

= E(S - /^) 2 + 2(//J - 0)E(6 - /vg) + { H - 0) 2 
= or| + 0 + - Of 

- 4 + b\ 

8 Iff is an abbreviation for "if and only if." 

9 Asymptotic unbiasedness is also a weaker condition than unbiasedness— since the latter 
applies for all n, not just n oo. 
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consistent. 10 jBut it is not a good estimator; the sample mean is preferred 
because it is poth consistent and efficient. 

As a final ^example, the sample MSD is a consistent estimator of a 2 . It 
is true that it is a biased estimator; but as n — ► oo , this bias disappears, 
i.e., it is asymptotically unbiased. 11 Since it can also be proven that its variance 
tends to zero, the conditions of corollary (7-22) are satisfied. This concept of 
a biased, yet consistent estimator is a very important one — for example, in 
econometrics 



PROBLEMS 

7-6 True cjr false? If false, correct it. 

(a) Th^e sample proportion P is an unbiased estimator of the popula- 
tion proportion rr. 

(b) ju Is a random variable (varying from sample to sample), and is 
used to estimate the parameter X. 

1-1 Based on a sample of 2 observations, consider the two estimators of fi: 

X = (i)Jf x + (k)X 2 

and 

^=(i)*i + H)x 2 ^ 

(a) Prove they are unbiased. 

(b) What is the efficiency of W relative to XI Which estimator is 
preferable? 

7-8 A farmer has a square field, whose area he wants to estimate. When 
he measures the length of the field, he makes a random error, so that 
his observed length 0] is a normal variate centered at 200 (the true but 
unknown value) with a — 20. Worried about his possible error, he 
decidesjto take a second observation 0 2 and average. But he is in a 
dilemm^ as to how to proceed: 

(1) Shojuld he average 0 1 and 0 2 , and then square? or 

(2) Should he square first, and then average? 
Mathematically, it's a question whether 



bias, and a varian 
11 To establish this 



is best 



10 To prove consistency, we use corollary (7-22), noting that the sample median has zero 
:e given by (7-16) which approaches zero. 
, we note that 

msd = (iziy 

Thus MSD as n ~+ oo. Since s 2 h unbiased (for any n), it follows that MSD is unbiased 

as n -> so 
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(a) Are methods (1) and (2) really different, or are they just 2 different 
ways of saying the same thing? 

(Hint. Try a couple of actual values, like 0 X = 230 and 0 2 = 200, and 
work out (1) and (2).) 

(b) If they are different, which has less bias? 

(Hints. This problem will actually be easier if you avoid arithmetic 
by generalizing from a length of 200 feet to a length of ju f and also use 
general a. Furthermore, the normality is irrelevant to questions of 
expectation. Finally, try using equation (4-5): E(X 2 ) = ju 2 + a 2 .) 

(c) Generalize answer (b) to a sample of n measurements. 

7-9 As in Problem 6-5b, consider a bowl full of many chips — one-third 
marked 2, one-third marked 4, and one-third marked 6. When a 
sample of 2 chips is drawn, construct the probability table of X, and 
hence 

(a) Show (once more) that X is an unbiased estimator of p. 

(b) Is (2X ■+ 1) an unbiased estimator of (2ju + 1)? 

(c) Is (X) 2 an unbiased estimator of /^ 2 ? (Compare Problem 7-8.) 

(d) Is l/X an unbiased estimator of l/jul 

How could you have answered parts (a), (b), and (c) theoretically, 
without going through all the computations? 
7-10 To illustrate bias very concretely, consider a sample of n = 2 tosses 
from the population of all tosses of a fair die. The population moments 
are easily computed: 

p = 3.5, a 2 = 35/12 
We shall study sample estimators in 2 ways. 

(a) Empirical approach (Monte-Carlo technique). Repeat the experi- 
ment many times. (You can simulate the roll of 2 dice with the 
random digits of Appendix Table II. If each student does it, say, 5 
times, and the results from the class ai j pooled, this would save work.) 
The result will be a table like: 

Result of 

2 Tosses X MSD s 2 



(3, 1) 2 12 

(2,5) 3.5 2i 41 



Averages 
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It may be convenient to array the data in a relative frequency table. 
Then } answer 

(1) Ejoes X average close to jul 

(2) Does s 2 average close to a 2 ? 

(3) Does MSD average close to <r 2 ? 

(b) Theoretical approach. In (a), if the experiment were repeated 
endlessly, the relative frequencies would settle down to probabilities. 
But these probabilities can be calculated very easily, by exploiting the 
symmetry of the dice. After calculating the relevant probability table, 
find j 

(1) E(X) L ju 

(2) E{s 2 ) = a 2 

(3) £(MSD) ~ a 2 

*7-3 MAXIMUM-LIKELIHOOD ESTIMATION (MLE) 
(a) Introduction 

The next^ question is, "Does some technique exist for finding estimators 
with these attractive characteristics?" The maximum-likelihood method is 
the technique that statisticians most often use. We introduce it with an 
example of sampling from a Bernoulli population; to be concrete, suppose we 
flip a biased coin 10 times in order to estimate 77, the population proportion 12 
of heads, ana get 4 heads. We shall temporarily forget the common-sense 
solution (estimate 77 with the sample proportion P = 4/10) in order to 
develop some general ideas. 

With 4 out of 10 heads before us, we ask, "Is .1 a reasonable estimate of 
77-?" If 77 wer6 .1, then the probability of four heads (successes) in our ten 
tosses (trials) would be, according to the binomial formula 



0 



('°j(.l) 4 (.9) e - .011 



In other words, if 77 = .1, there is only about one chance in a hundred 
that we woulc get the sample we observed. 

Similarly, we might ask ourselves how likely our result of four heads 
would be if 77 were .8. The student can verify that the probability of getting 
4 heads from his sort of population is only .006; again it seems implausible 
that a population with 77 = .8 would yield the sample result we observed. 



12 This is also, of course, the population probability of heads. But, for simplicity, we refer 
hereafter only to the ''proportion." 



Table 7-2 Outline of Maximum Likelihood Estimation (MLE) 



Binomial: Special Case in Text 


Binomial: General Case 


MLE of fi from a Sample Drawn from 
a Normal Population, p{x; //) 


MLE of any Parameter 0 from 
any Population pU ■; 0) 


Given: 4 sue 

Find: ML! 

As The 
follows: in 10 

P(A 

But 

prob 

10 ai 

obse 

This 

writt 

func 

77 

0 

.1 

.2 
.3 
.4 
.5 
.6 
.7 
.8 
1.0 

Conclude: M 
the f 


xesses in 10 trials. 

1 Of 77. 

probability of 4 heads 
trials is 

-»)• 

n this estimation 
lem the sample values 
id 4 have already been 
rved, hence are given, 
therefore must be 
en as a likelihood 
tion of 7T only: 

Lto= tt) 6 
0 

.011 

.088 
.200 

.251 ■<= max 
.205 
.111 
.037 
.006 
0 

LE of 7T is .4 which is 
>ample proportion P. 


•t* successes in n trials. 
MLE of 77. 

Our x successes in n trials 
were generated by: 

p{*\7T) = Qw*(t -77)— 

Since n and a- are fixed at 
their observed values we 
can write this as a likeli- 
hood function of tt only: 

At what value of 7t is this 
function a maximum? 
Calculus shows that this 

.V 

occurs when 77 = - = P, 
n 

the sample proportion. 
Hence, the MLE of 77 is P. 


Sample values: x l9 .v it M 3 . 
M LE of fi. 

Probability of our sample resulting from 
any jti is 

/K»'n -H* x z\ /«) = 

But with (x ly x 2 ,Xz) fixed at their observed 
values, the above becomes the likelihood 
function L{fi), an expression in ft : 

L(,t)= fi T s—e-M** 2 )^-*)*] 

Try out all possible values of //, selecting 
that one that maximizes this function. 
Calculus shows that this is: 
f i = + x 2 + .r 3 ) 
= the sample mean. 

Thus X is the MLE of//. 


MLE of 0. 

Probability of our sample, for 
any 0, is: 

^(Xj, .c 2 , . . . 0) 

= P(^;0)p(.^0)"- p(,: n ;()) 

= 11 ;0) 

^ 1 

But with (.« A , .c 2 , . . . ;t'„) fixed at 
their observed values, the above 
function becomes L(0), an 
expression in 0: 

L(0) - U ^,;0) 

Select the value of 0 that maxi- 
mizes this likelihood function. 



'T' Z7 "THWY" r ~ "' 
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Similarly, we consider all the other possible values for 77, in each case 
asking how likely it is that this tt would yield the sample that we in fact 
observed. The results are shown in the first column of Table 7-2, and graphed 
fully in Figure 7-6. We refer to this as the likelihood function, when the 
sample values of 4 and 10 are fixed, and the only variable in the function is 
the hypothetical value of tt. For emphasis, we often write this as a function of 
77 alone: 

w = ( l 4°)^( l - -) 6 

The maximum likelihood estimate (77 = .4) is the value maximizing this 
likelihood function. In general: 

Definition] 



The MLE is the hypothetical population value which 
nlaximizes the likelihood of the observed sample. 



(7-23) 



We note: 



(a) The sample proportion P is our MLE of the population proportion 
77; it is often, but not always the case that the corresponding sample value is 
the MLE of the population parameter. 

(b) Figure 7-6 is the likelihood function for the particular sample we 
observed, (i.e. j 4 heads in 10 tosses). A different sample result would call for 
a different like ihood function, and hence a different MLE. 



L(7T) 



FIG. 7-6 An eximpl 
hypothetical 




® .5 
Gives 
maximum 



1.0 

Hypothetical values of 
population proportion, ir 



e of a likelihood function. L(rr), the likelihood function that various 
population proportions would yield the sample we observed. 
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L(ir) or p(x; t) 





L(tt), Itkelihood of various 
hypothetical values of tt 
j yielding the given sample of 
' 4 successes in 10 trials 
(same as in Fig. 7-6) 


2 


4 6 8 10 


/ 'r<l 


\J / / / / / / 




x = number of 
successes in 
10 trials 



p(x; tt)j probability 
of x successes in 10 
trials, given tt = .8 



tt = population 
proportion 

FIG. 7-7 The binomial probability function p(x\ tt) plotted against both x and tt. 

In Figure 7-7 this discussion is related to our previous deductions about 
the binomial in Chapters 3 5 4, and 6. In this figure we graph the binomial 
probabilities p(x; n 9 tt). [Since n is set at 10 regardless, this function is referred 
to simply as p(x\ tt)]. In earlier chapters we regarded tt fixed and x variable, 
as in slice a; thus the dotted function shows the probability of various num- 
bers of heads if the population proportion is given as .8. In this chapter we 
regard x — the observed sample result — as fixed, while the population tt is 
thought of as taking on a whole set of hypothetical values; thus slice b shows 
the likelihood that various possible population proportions would yield 4 
heads. Slices in the a direction are referred to as probability functions, while 
slices in the b direction are called likelihood functions. 

We now generalize maximum likelihood estimation, (A summary of our 
results is shown in the last three columns of Table 7-2 for reference.) 



(b) General Binomial 

It is very easy to show that our result in the previous section was no 
accident, and that the maximum likelihood estimate of the binomial tt is 
always the sample proportion P. 

Given any observed sample of x successes in n trials, the likelihood 
function is 

L(tt)« l n \-n*{\ - tt)— (7-24) 
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With calculus it can easily be shown 13 that the maximum value of this likeli- 
hood function 6ccurs when n — xjn = P. Thus 



MLE of 7T = P, the sample proportion. 



We argued in Chapter 1 that it is reasonable to use the sample proportion 
to estimate the population proportion; but in addition to its intuitive appeal, 
we now add the more rigorous justification of maximum likelihood: a popula- 
tion with tt = P would generate with the greatest likelihood the sample we 
observed. 



(c) MLE of thf Mean (pt) of any Normal Population 

Suppose we have drawn a sample (x u x 2 , x 3 ) from a parent population 
which is N(/u, cr); our problem is to find the MLE of the unknown for this 
sample. Because the population is normal, the probability of getting any 



value x, given 



population mean /u is: 



-(l/2<T)(a!-ji)- 



2n<r 



(7-26) 



Specifically, the, probability that we would get the value x ± that we observed 
in our first sample draw is 

1 



p -(l/2a 2 ){ffi 1 -/i) 2 



V 277(7*" 

while the probabilities of drawing the values x 2 and x z are, respectively 



(7-27) 



p(a 2 ;//) = — 



V277/7 2 



13 To find where Lirr) is a maximum, set the derivative equal to zero. 

d -^j-L := \ - *)(1 ~ rr)"" 35 -^-!) + xn x - l (\ - tt)"" 35 ] = 0 



(7-28) 



17-25) 



Dividing by {^ir x ' l {\ - 7r) n ~ x ~\ (7-25) becomes: 



-7r(n - x) + x([ - tt) = 0 

— tlTT + X = 0 

X 

TT = - 

n 

You can easily confirm that this is a maximum (rather than a minimum or inflection point). 
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m p(*;m*) 




a p(*;mo) 




*1 12^ X 3 

MO 



(b) 

FIG. 7-8 Maximum likelihood estimation of the mean (//) of a normal population, based 
on three sample observations (x ly x 2y x 3 ). (a) Small likelihood £(//*), the product of the 
three ordinates. (b) Large likelihood L(ju 0 ). 



and 



p(x z ; p) = -J= W*' (7 . 29) 

V277-0- 2 

We assume as usual that X l9 X 29 and X 3 are independent so that the 
joint probability function is the product of (7-27), (7-28), and (7-29): 



_1 e -iV^H Xi -^~] (?-30) 
-V2tt(T 2 J 



where fj means "the product of," just as J means "the sum of." But in our 
estimation problem the sample values x ( are fixed and only is thought of as 
varying over hypothetical values; we shall speculate on these various possible 
values of [i, with a view to selecting the most plausible. Thus (7-30) can be 
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loo- 



n~~l IJ2-, 
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(7-31) 



The MLE of jn is defined as the hypothetical value of [i which maximizes the 
likelihood function (7-31). Its value may be derived with calculus, but we 
consider only a geometric interpretation in Figure 7-8. 

We "try but" two hypothetical values of /*. We note that a population 
with mean ft J as in Figure 7-Sa is not very likely to yield the sample we ob- 
served. Although the probabilities of x x and x 2 are large, the probability of 
x 3 (i.e., the ordinate above x 3 ) is very small because it is so far distant from 
The product of all three probabilities [i.e., the likelihood of a population 
with mean /^generating the sample (x l9 x 2 , x 3 )] is therefore quite small. On 
the other han^l a population with mean p 0 as in Figure 7-86 is more likely to 
generate the siample values. Since the x values are collectively closer to p Q9 
they have a greater joint probability. Thus the likelihood is greater for 
// 0 than for ; indeed, very little additional shift in ju Q is apparently required 
to maximize tip likelihood of the sample. It seems that the MLE of fi might 
be the sample; mean— i.e., the average value of x u x 2 , and x 3 ; this can, in 
fact, be proved, as in Problem 7-12. 

Finally, the reader who has carefully learned that [i is a fixed population 
parameter ma} wonder how it can appear in the likelihood function (7-31) 
as a variable. 'jThis is simply a mathematical convenience. The true value of 
/* is, in fact, fixed. But since it is unknown, in MLE we must consider all of 
its possible, or hypothetical values; the way to do this mathematically is to 
treat it as a va'iable. 



(d) MLE of any Parameter from any Population 

We now state MLE in its full generality. A sample (x l9 x 2 ■ • ■ x n ) is 
drawn from a population with probability function p(x; 0), where 0 is any 
unknown population parameter that we wish to estimate. From our definition 
of random sampling (with replacement, or from an infinite population), the 
Xi are independent, each with the probability function p(x ( ; 6); hence the 
joint probability of the whole sample is obtained by multiplying. 

p(x l9 x 2 ' " x n ; 6) = p(x x ; 0) p(x 2 ; 6) • • • p(x n ; 0) 



-TTK^;e) 

i=i 



(7-32) 



But we regard the observed sample values as fixed, and ask, "Which of all 



the hypothetical 



values of 6 maximizes this probability?" This is emphasized 



A 



148 ESTIMATION I 

by renaming (7-32) the likelihood function: 

IM-TlfalO) (7-33) 

The MLE is that hypothetical value of 0 that maximizes this likelihood 
function. 



(e) Maximum Likelihood vs Method of Moments Estimation 
(MLE versus MME) 

In the analysis above, we have estimated a population proportion with 
a sample proportion, and a population mean with a sample mean. Why not 
always use this technique, and estimate any population parameter with the 
corresponding sample value? This is known as method of moments estimation 
(MME). Its great advantage is that is it plausible and easy to understand. 
Moreover, MLE and MME often coincide. 

But suppose the two methods do differ (as in Problem 7-14)? In such a 
circumstance MLE is usually superior. The intuitive appeal of MME is more 
than offset by the following impressive advantages of MLE. Since MLE is 
the population value most likely to generate the sample values observed, it is 
in some sense the population value that "best matches" the observed sample. 
In addition, under broad conditions MLE has the following asymptotic 
properties : 

L Efficient, with smaller variance than any other estimators. 

2. Consistent, that is, asymptotically unbiased, with variance tending 
to zero. 

3. Normally distributed, with easily computed mean and variance; hence 
it may be readily used to make inferences. 

For example, we have already seen that these three properties are true 
for X, the MLE of jli in a normal population. [Property 3 follows from 
Theorem (6-13); Property 2 follows from (6-10) and (6-11); Property 1 is 
proved in advanced texts, and has been alluded to in (7-17),] 

We emphasize that these properties are asymptotic, that is, true for 
large samples as n-> oo. But for the small samples often used by economists 
for example, MLE is not necessarily best. 

PROBLEMS 

*7-ll Following Figure 7-6, graph the likelihood function for a sample of 
6 heads in 8 tosses of a coin; show the MLE. 
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*7-12 Deriv^ the MLE of p for a normal population, using calculus 

* 7 " 13 know ° f "* n ° rmal distribution > as ^™ng M ^ 

(b) Is :t unbiased? 

*7-14 As N + 1 delegates arrived at a convention, they were given successive 
tags numbered 0,1,2,3,...^. I„ order to estimate the unknown 
number N a brief walk in the corridor provided a sample of 5 tags 
numbered ^7 1£ AA Al -r> r ° > 



(a) Wfl 

(b) Wl 



numbered 37, 16, 44, 43, 22. 



lat is the MME of Nf Is it biased? 
at is the MLE of N? Is it biased? 



FURTHER READING 

For a detailed description of the virtues of MLE, see for example 

1. Wilks, i S. Mathematical Statistics, New York: John Wiley & Sons (1962) 

2. L.ndgren, B. W. Statistical Theory, New York: Macmillan (1959). 



chapter 8 
Estimation II 



8-1 DIFFERENCE IN TWO MEANS 

In the previous chapter, we used a sample mean to estimate a population 
mean. In this chapter we will develop several other similar examples of how 
a sample statistic is used to estimate a population parameter. 

Whenever two population means are to be compared, it is usually their 
difference that is important, rather than their absolute values. Thus we often 
wish to estimate 

/ w 1 - fx 2 (8-1) 

A reasonable estimate of this difference in population means is the difference 
in sample means 

X x - X 2 (8-2) 

(Assuming normality of the parent populations, this is the maximum likeli- 
hood estimator, with many attractive properties.) 

Again, because of the error in point estimates, we are typically interested 
in an interval estimate. Its development is comparable to the argument in 
Section 7-1, and involves two steps: the distribution of our estimator 
{X x — X 2 ) must be deduced; then this can be "turned around" to make an 
inference about the population parameter (^ — /u 2 ). 

First, how is the estimator (X 1 — X 2 ) distributed? From (6-31) we know 
that the first sample mean X x is approximately normally distributed around 
the population mean (a, x as follows. 

^-Ni^al/nO (8-3) 

where a\ represents the variance of the first population, and n x the size of 
the sample drawn. Similarly 

Nivalin,) (8-4) 
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Standard deviation 
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of(Xx~Jf 2 ) is^ + ^ 




FIG. 8-1 Distribution of {X x - X 2 ). 



Independence of the two sampling procedures will ensure that the two 
random variables A\ and X 2 are independent; hence (5-31), (5-34), and 
(6-1.3) can be applied directly: " 

(jr a 1 p 2 ) = (or, + yv( ft - /* 2) <r*K + ay n ,) (8-5) 

This distribution of (X 1 — X 2 ) is shown in Figure 8-1. Equation (8-5) is 
exactly true, ^assuming that both populations are normal; it still remains 
approximately true (by the central limit theorem) for large samples from 
practically aqy populations. 

Under these conditions, our knowledge in (8-5) of how the estimator 
(X ± — X 2 ) benaves can now be turned around to construct the confidence 
interval : 



95%: confidence interval for the difference in means (/^ - // 2 ) 



(8-6) 



When a l and j a 2 have a common value, say <x, the 95% confidence interval 
for (fi 1 — //a); becomes : 



(x x - x. 2 ) ± f.96a /-- + - 

V «1 "2 



(8-7) 



The variances of the two populations, a\ and a\ in (8-6) are usually 
not known; the best the statistician can do is guess at them, with the variances 
s\ and s\ he cbserved in his two samples. Provided his sample is large, this 
is an accurate enough approximation ; but with a small sample, this introduces 
a new source of error. The student will recall that this same problem was 



4» 
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encountered in estimating a single population mean in Section 7-1. In the 
next section we shall give a solution for these problems of small-sample 
estimation. 



PROBLEMS 

8-1 A random sample of 100 workers in one large plant took an average of 
12 minutes to complete a task, with a standard deviation of 2 minutes. 
A random sample of 50 workers in a second large plant took an average 
of 11 minutes to complete the task, with a standard deviation of 3 
minutes. Construct a 95% confidence interval for the difference between 
the two population averages. 

8-2 Two samples of 100 seedlings were grown with two different fertilizers. 
One sample had an average height of 10 inches and a standard deviation 
of 1 inch. The second sample had an average height of 10.5 inches and 
a standard deviation of 3 inches. Construct a confidence interval for the 
difference between the average population heights {ji x — fi 2 ) 

(a) At the 95% level of confidence. 

(b) At the 90% level of confidence. 

8-3 A random sample of 60 students was taken in two different universities. 
The first sample had an average mark of 77 and a standard deviation of 
6. The second sample had an average mark of 68 and a standard 
deviation of 10. 

(a) Find a 95% confidence interval for the difference between the mean 
marks in the two universities. 

(b) What increase in the sample size would be necessary to cut the error 
allowance by 1/2? 

(c) What increase in the sample size would be necessary to reduce the 
error allowance to 1.0? 



8-2 SMALL SAMPLE ESTIMATION: THE t DISTRIBUTION 

We shall assume in this section that the populations are normal. 

(a) One Mean, fx 

In estimating a population mean /u from a sample mean X, the statistician 
generally has no information on the population standard deviation a; hence 
he uses the estimator s, the sample standard deviation. Substituting this into 



t DISTRIBUTION 



Normal, same as t 
with d.f. = oo 
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* 025 = 4.30 

" 2 .025 

FIG. 8-2 j The standard normal distribution and the / distribution compared. 

J I! ■ 

(7-10), he estimates the 95% confidence interval for /u as, 

_ , $ 

ft — % ZjZ ^025 j— 



Provided his 



In 



(8-8) 



sample is large (at least 25-50, depending on the precision 
required), this will be a reasonably accurate approximation. But with a 
smaller sample size, this substitution introduces an appreciable source of 
error. Hence jif he wishes to remain 95% confident, his interval estimate 
must be broadened. How much? 

Recall thpt X has a normal distribution; when a is known, we may 
standardize, obtaining 

a/Jn 

where Z is the standard normal variable. By analogy, we introduce a new 
"Student 1 f variable, defined as 

' ' s ■ 

' ^ (8-10) 



z = 



(8-9) 



t == 



The similarity of these two variables is immediately evident. The only 
difference is tljiat Z involves cr, which is generally unknown; but / involves 
s, which can ialways be calculated from an observed sample. The precise 
distribution ot" /, like Z, has been derived by mathematicians and is shown 
in Table V of 1 he Appendix. The distribution of / is compared to Z in Figure 
8-2. 



1 This / variable j\vas first introduced by Gosset writing under the pseudonym "Student," 
and later proved valid by R. A. Fisher. We make no attempt to develop the entire proof, 
because it is not very instructive. It can be found in almost any mathematical statistics text. 
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(We must emphasize a break in notation. Until now, capital letters 
denoted random variables, while small letters denoted their realized values. 
But from now on, in order to conform to common usage, we shall entirely 
forget this convention; to represent either random variables or realized 
values, we shall use small letters t and s, and capital letters X, X, Z, P, etc.) 

As expected, the t distribution is more spread out than the normal, 
since the use of s rather than a introduces a degree of uncertainty. Moreover, 
while there is one standard normal distribution, there is a whole family of 
/ distributions. With small sample size, this distribution is considerably more 
spread out than the normal; but as sample size increases, the / distribution 
approaches the normal, and for samples of about 50 or more, the normal 
becomes a very accurate approximation. 

The distribution of t is not tabled according to sample size («), but 
rather according to "degrees of freedom," the divisor in s 2 . Thus, in cal- 
culating s 2 we may write 2 : 

d.f. = degrees of freedom = n — \ (8-11) 

For example, for a sample with n = 3, then d.f. — 2, and we find from 
Appendix Table V that the critical t value which leaves 2|% probability 
in the upper tail is 

'.025 = 4.30 

This is shown in Figure 8-2. By symmetry, it follows that for any observed t 
Pr (-4.30 < / < 4.30) = 95% (8-12) 
Substituting for t according to (8-10): 

Pr (-4.30 < ^==^ < 4.3<A = 95% (8-13) 

This deduction can now be "turned around" into the following inference: 
for a sample of size 3, the 95% confidence interval for /u is 

fi = X ± 4.30-4= ( 8 " 14 ) 

> 

2 The phrase "degrees of freedom" is explained in the following intuitive way: 

Originally there are n degrees of freedom in a sample of n observations. But one 

degree of freedom is used up in calculating X, leaving only n — 1 degrees of freedom for 

the residuals (X t — X) to calculate s 2 . 

For example, consider a sample of two observations, 21 and 15, say. Since X = 18, 

the residuals are +3 and —3, the second residual necessarily being just the negative of the 

first. While the first residual is "free," the second is strictly determined; hence there is only 

1 degree of freedom in the residuals. 

Generally, for a sample of size /?, it may be shown that if the first n — 1 residuals are 

specified, then the last residual is automatically determined by the requirement that the sum 

of all residuals be zero, i.e., ^(X l — X) ~ 0. 



t DISTRIBUTION 

For a general sample size n, the 95 % confidence interval for p, is 



/< — X ± / 025 — - = 



155 



(8-15) 



where 1 025 is } the critical £ value leaving 2\% of the probability in the 
upper tail, with n — 1 degrees of freedom. 

To sum ujp, we note the similarity of t estimation in (8-15) and normal 
estimation in (7-10). The only difference is that an observed sample value (s) 
is substituted tor a, and as a consequence a critical t value must be substituted 
for the normal value. 

An important practical question is; "When do we use the t distribution 
and when do we use the normal?" If a is known, the normal distribution is 
used; but if cris unknown, then the t distribution is theoretically appropri- 
ate — regardles^ of sample size. However, if the sample size is large, the normal 
is an accurate enough approximation 3 of the t. So in practice the t distribution 
is used only for small samples when a is unknown — and the normal is used 
otherwise. The t distribution has one additional requirement : the parent popula- 
tion from whictf the sample is drawn is assumed normal. (But normality is a 
requirement for all our small-sample estimation, even if c is known. Recall 
from Chapter 7 that inference about a nonnormal population was validated 
by the central limit theorem only if the sample size was large.) 

As sample size (n) decreases, estimation becomes less precise (i.e., 
interval estimates become wider). The two reasons for this are clearly dis- 
tinguished in (8-15). First, the divisor \jn becomes smaller. This appears in 
(7-10) as well ajs in (8-15); thus even if a is known and inference is based on 
the normal distribution, the error allowance increases and the interval 
estimate becorrjes wider as a consequence. The secondary reason for loss of 
precision occurs if s must be substituted for an unknown a. The smaller the 
sample, the more the appropriate t distribution will depart from the normal; 
and the more spread the / distribution, the broader the interval estimate^ 



(b) Difference hi Two Population Means (^ — jn 2 ) 

We shall ajsume, as often occurs in practice, that even though the two 
populations may have different means, they have a common variance, a 2 . 



3 This may be verified from Table V. For example, a 95% confidence interval constructed 
from a sample of size 61 should use a critical / value of 2.00; but the use of the normal value 
of 1.96 as an approximation involves very little (2%) error. 

As we scan down the /. 02 5 column in Table V, these critical values approach z 025 — 1 .96. 
this verifies Figure 8-2, where the t distributions approach the normal. 
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When a 2 is known, (8-7) is appropriate. When unknown, a 2 must be esti- 
mated. The appropriate estimate is to add up all the squared deviations from 
both samples, and then divide by the degrees of freedom (n x — 1) + (n 2 — 1), 
to obtain an unbiased estimator called the pooled sample variance: 



W2< 

n 2 — 2)L'-=i 



(8-16) 



where X u represents the z'th observation in the first sample. Substitution of 
s p for a in (8-7) requires that the t distribution also be used, obtaining the 
95% confidence interval: 



(/<i ~ /< 2 ) = (Xi - X 2 ) ± / >025 s s 



IT 



(8-17) 



where 1 025 is the critical t value with d.f. = « 2 + n t — 2. 



PROBLEMS 

8-4 Sixteen weather stations at random locations in a state measure rain- 
fall. In 1967, they recorded an average of 10 inches- and standard 
deviation of 1.5 inch. For the mean rainfall for the state, 

(a) Construct a 95% confidence interval. 

(b) Construct a 99% confidence interval. 

8-5 100 cars on a thruway were clocked at an average speed of 69 m.p.h., 
with a standard deviation of 4 m.p.h. Construct a 95% confidence 
interval for the mean speed of all cars on this thruway. 
(8-6) A random sample of 4 students in a large statistics course received the 
following marks: 56, 70, 55, 59. Construct a 95% confidence interval 
for the average mark of all students in the course. 

8-7 From a sample of five random normal numbers from Table Tib find 
a 95% confidence interval for the mean of the population. 

8-8 Five people selected at random had their breathing capacity measured 
before and after a certain treatment, obtaining the following data: 



Person 


Breathing Capacity 
Before (X) After ( Y) 


Improvement 


A 


2750 


2850 


+ 100 


B 


2360 


2380 


+20 


C 


2950 


2800 


-150 


D 


2830 


2860 


+ 30 


E 


2250 


2300 


+ 50 
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Let ju x *(and ix y ) be the mean capacity of the whole population before 
(and afjter) treatment. 

(a) Whjat is the (point) estimate of the mean improvement (fi r ju x )l 

(b) Construct a 95 % confidence interval for {ji r — p x ). 

8-9 In a random sample of 10 football players, the average age was 27 
and thejSum of squared deviations was 300. In a random sample of 20 
hockey players, the average age was 25 and the sum of squared devia- 
tions wis 450. Estimate, with a 95% confidence interval, the difference 
in the population means, assuming a x = cr 2 . 
8-10 Given tie following random samples from 2 populations: 

n x = 25 X x = 60.0 s x = 12 
« 2 = 15 X, = 68.0 s 2 = 10 
and assume a ± — <r 2 
Find a 95% confidence interval for (ju 1 — ju 2 ). 

8-1 1 Derive tie confidence interval (8-14) from (8-13). {Hint. Use practically 

the same method as in the footnote to equation (7-4).) 
8-12 Derive the confidence interval (8-6) from (8-5). 



8-3 ESTIMATING POPULATION PROPORTIONS: THE 
ELECTION PROBLEM ONCE AGAIN 



In Sectio^i 6-6, we saw that a sample proportion P is just a sample 
n X in 
of 10, then 

P = X • 

Similarly, the 



mean X in disguise. For example, if we observe 4 Democrats in a sample 



A- 0 + 1+ 0 + 0 + 0 + 1+ 0+1+0 + o) = A 



population proportion tt is just the population mean in 
disguise. The simplest method of deriving an interval estimate for a pro- 
portion is therefore to modify (7-10), the interval estimate for a mean. Thus 
the 95% confidence interval for n is 



n = P ± 1.96 



V 



tt( 1 — rr) 



(8-18) 



We confirm th.it (8-18) is just a recasting of (7-10). X is replaced by P, and 
\/<r 2 /w (the standard deviation of X) is replaced by V -""(I — ir)fn [the standard 
deviation of P, as given in (6-30)]. 

But we seejm to have reached an impasse; the unknown rr appears in the 
right-hand side of (8-18). Fortunately, the situation has a remedy: substitute 



4 



158 



ESTIMATION II 



the sample P for tt in the right side of (8-18). This is a strategy we have used 
before, when we substituted s for a in the confidence interval for ju. Again, 
this approximation introduces another source of error; but with a large 
sample size, this is no great problem. Thus: 



where z 025 is the critical value leaving 2\ % of the probability in the upper 
tail. As an example, the voter poll of Section 1-1 used this formula. 

For small samples, there are several options. The simplest is to read 
the interval estimate for tt from Figure 8-4, a table which is constructed 
in the following manner. The first step is the mathematical deduction of how 
the variable estimator P is distributed, for any population tt. This is shown 
for a sample size 20 in Figure 8-3. Thus, for example, if tt = .4, then the 
sample P has the dotted distribution shown in this diagram, and there is a 
95% probability that any P calculated from a random sample of 20 will lie 
in the interval ab. For each possible value of tt, such a probability function 
of P defines two critical points like a and b. When all such points are joined, 
the result is the two curves enclosing a 95% probability band. 

This description of how the statistic P is related to the population tt 
can of course be "turned around" to draw a statistical inference about tt 
from a given sample P. For example, if we have observed a sample proportion 
P ± = 1 1/20 = .55, then the 95 % confidence interval for tt is defined by/g, the 



For large samples, the 95% confidence inter- 
val for rr is: 



jP(\ - P) 



(8-19) 




Probability 




1.0, 




Outline of the discrete 
probability function of 
P, when tt = .4 



R 



0 



Observed Pi = .55 



1.0 



P 



FIG. 8-3 Distribution of P. 




FIG. 8-4 95% confidence intervals for population proportions (77). [Reproduced with 
the permission df Professor E. S. Pearson from C. J. Clopper and E. S. Pearson, 'The Use 



of Confidence o:' 



Fiducial Limits Illustrated in the Case of the Binomial, 
(1954), p. 404.] 



' Biometrika, 26 



width of this probability band above P u i.e. 

.31 < 77 < .77 



(8-20) 



Whereas the (deduced) probability interval is defined in the horizontal 
direction of the P axis, the (induced) confidence interval is defined in the 
vertical direction of the tt axis. 

This is the same logic we have used in deriving confidence intervals 
before. We will nevertheless pause briefly to review, because this is a more 
generalized argument than we have previously encountered. Suppose the 
true value of tt is .4; then there is a probability of 95 % that a sample P will 
fall between a and b. If and only if it does (e.g., P x ) will the confidence interval 
we construct bracket the true tt of .4. We are therefore using an estimation 
procedure which is 95% probable to bracket the true value of tt, and thus 
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yield a correct statement. But we must recognize the 5 % probability that the 
sample P will fall beyond a or b (e.g., P 2 ); in this case our interval estimate 
will not bracket tt = .4, and our conclusion will be wrong. 

Why is this a more general theory of confidence interval estimation? 
In previous instances (e.g., estimating a population mean, ju) we constructed 
a confidence interval symmetrically about our point estimate X. But in 
estimating 77, no such symmetry is generally involved. 4 For example, with 
our observed sample proportion P 1 — ,55, the confidence interval (8-20) we 
constructed for tt was not symmetric about our point estimate .55. 

The 95% probability band in Figure 8-3 is set out in Figure 8-4, along 
with the similar bands appropriate for other sample sizes. This neater 
diagram is used to construct 95% confidence intervals for tt. 

As an example, if we have observed a P = .6 in a sample of 15, the 95% 
confidence interval for tt is approximately 



.32 < tt < .84 



For the same P = .6 in a larger sample of 100, the 95% confidence 
interval for tt is narrower: 



.50 < tt < .70 



Alternatively, with such a large sample, (8-19) could have been used, with 
the same result, i.e., 



„ = . 60± i.96/^M> 

V ioo 

- .60 ± .10 

Finally, there is a third method of estimating tt that we introduce, not 
so much for its practical value as for its illustration of this useful principle: 
with a little imagination, several alternative methods of solving a problem 
can often be developed, and the most appropriate one to use in a given set 
of circumstances is a matter of judgment. 

Let us be conservative, and ask: what is the maximum width of the 
interval estimate in (8-18), i.e., what is the maximum value that the error 

4 The student may have wondered why the 95 % probability band does not converge on the 
two end points O and R. It is true that one half of this band (made up of all points similar 
to b) does intersect the P axis at 0; this means that if tt is zero (e.g., no Socialists in the U.S.), 
then any sample P must also be zero (no Socialists in the sample). But the other half of this 
band does not intersect the rr axis at 0; instead it intersects at h. This means that an observed 
P of zero (e.g., no Socialists in a sample) does not necessarily imply that n is zero (no 
Socialists in the U.S.). 
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allowance 1.06 /— ™ can have? Tt is easily shown 5 that the maximum 

\ n J 

value of 7r(l j— n) is 1/4. 

Then (8-18), the"95% confidence interval for rr (with a large sample), 

can be written 



or simply: 



77 = P ± 1.96 



::(8-21) 



But this is assuming the worst; if, in fact, tt is not 1/2, then tt(\ — tt) is less 
than 1/4, and our interval estimate need not be this wide; or to restate, 
(8-21) is an interval estimate for tt with at least a 95% level of confidence. 
For example^ this very simple formula is sometimes used in political polls 
where it is known on the basis of historical experience that the proportion of 
Democrats is close to 1/2. In these circumstances (8-21 ) becomes ;a very 
accurate approximation. 

For completeness, we write the 95 % confidence interval for the difference 
in 2 proportions: 



For large n, 



rr,) -/>,) ± 1.96 



V 



"2 



(8-22) 



This is derived in essentially the same way as (8-6), 



1 The simplest w 
To prove it 



ay is with calculus, setting the derivative of tt(1 — tt) equal to zero, 
without calculus, we may simply graph fO) = rr(\ — tt), as follows: 




Note that for eitier extreme value of tt (1 or 0) the value of f(n) is zero; and if rr "= 1/2, 
then 7r(\ — tt) reaches its maximum value because of symmetry of the parabola. 
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PROBLEMS 

8-13 Construct a 95% confidence interval for 77, the proportion of Repub- 
lican voters in the U.S., if there were 4820 Republicans in a random 
sample of 10,000. 

8-14 In a random sample of tires produced by a certain company, 20% 
did not meet the company's standards. Construct a 95% confidence 
interval for the proportion u (in the whole population of tires) 
which do not meet the standards, 

(a) If the sample size n = 10. 

(b) If n = 25. 

(c) If n = 2500. 

(8-15) By talking to 15 voters you discover that only 3 favor a certain 

candidate. Construct a 95% confidence interval for the proportion 

of all voters favoring this candidate. 
(8-16) In a random sample of 100 British smokers, 28 preferred brand X. 

Construct a 95% confidence interval to estimate the proportion of all 

British smokers who prefer X. 
8-17 In a survey of U.S. consumer intentions, 498 families in a random 

sample of 2500 indicated that they intended to buy a new car within 

a year. Construct a 95% confidence interval for the proportion of all 

U.S. families intending a new car purchase. 

(a) Answer two ways; 

(1) Using the usual formula (8-19). 

(2) Using the simplified formula (8-21). 

(b) If the sample P had been .40, would the error in (a2) have been 
greater? 

8-18 If 77 = 3/4, what is the precise percentage error introduced by using 
(8-21) rather than (8-19)? Does this suggest that (8-21) is a reasonable 
approximation, provided 1/4 < tt < 3/4? 

8-19 A sample of 100 cars was taken in each of 2 cities. In one city 72 of 
the cars passed the safety test; in the second only 66 passed. Construct 
a 95% confidence interval for the difference between the proportions 
of safe cars in the two cities. 

8-20 A sample of 3182 voters yielded the following frequency table,* 
relating their attitudes to Senator Joseph McCarthy and their vote 
in 1948. Construct a 95% confidence interval for (tt 1 — tt 2 ), where tt 1 

* From S. M. Lipset, "The Radical Right" in Bell, Daniel, ed. 77?^ New American Right, 
New York Criterion Books [19551. 
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is the 
and 7T< 



proportion of all Democrat voters who were pro-McCarthy, 
s the proportion of Republican voters who were pro-McCarthy. 

Attitude to 
McCarthy 



1958 
Vote 



Pro 



Anti 



Democrat 
Republican 



506 
563 



1381 

732 



(8-21) Tn an irban survey of 1000, 790 favored certain legislation. In a rural 
survey of 300, 180 opposed the same legislation. 

Construct a 99% confidence interval for the difference between 
the proportions of city and country voters who favor the legislation. 



*8-4 ESTIMATING THE VARIANCE OF A NORMAL 

POPULATION: THE CHI-SQUARE DISTRIBUTION 

There is cne further example of a confidence interval, interesting not so 
much for its practical value 6 as for the insight it provides. 

Consider a normal population N(jt<i, a 2 ) with both ix and a 2 unknown. 
So far we hav£ estimated a 2 with *y 2 only as a means of rinding a confidence 
interval for ^. Now suppose, on the other hand, that our primary interest is 
in <7 2 , rather tljan fx. For example, we may wish to ask "How much variance 
is there in Japan's balance of payments?" in order to get some indication 
of the country's requirement of foreign exchange reserves. Or, we may ask 
"What is the 'variance of farm income?" in order to evaluate whether a 
policy aimed at stabilizing farm income is necessary. 7 To estimate variance 
we shall assune that the random variable (e.g., farm income) is normally 
distributed; if ^o, how do we proceed? 

We have already seen, in Section 7-2, that s 2 is an unbiased estimator 
of a 2 ; but to construct an interval estimate for a 2 we must ask: "how is the 
estimator s 2 distributed around a 2 T To answer this, it is customary to 

fi One reason that the confidence interval for o 2 is of limited practical use is that it depends 
crucially on the assumption that the parent population is normal. By contrast, most 'of the 
confidence intervals for means remain approximately true even if the parent population is 
nonnormal; such: confidence intervals are called robust. 

7 Income stabilization policies are almost always designed to stabilize income around a 
reasonably high level. Thus they aim both at reducing variance (rr 2 ) and raising average 
income (ft). Here we concentrate only on the variance problem. 
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P(C 2 ) 




0 .325 1 ~~ 2.05 C 2 = s 2 /(r 2 



(where s 2 = <r 2 ) 
FIG. 8-5 Distribution of the modified chi-square, C 2 . 

define a new variable: 

C 2 ~ 4 (8-23) 
a 

Of course, when s 2 = cr 2 , this ratio is 1 ; thus our question can be rephrased: 
"how is C 2 distributed around 1 T 

C 2 is called a modified Chi-square variable, with n — 1 degrees of free- 
dom. 8 It has been proved by advanced calculus that the distribution of C 2 is 
that of Figure 8-5; critical values are given in Appendix Table VI. 

Since its numerator s 2 and denominator a 2 are both positive, the variable 
C 2 is also always positive, with its distribution falling to the right of zero in 
Figure 8-5. For small sample values we note that it is also skewed to the 
right; but as n gets large, this skewness disappears and the C 2 distribution 
approaches normality. Since s 2 is an unbiased estimator of a 2 , this implies 
that the expected value of each of these C 2 distributions is 1. Moreover as 
sample size increases C 2 becomes more and more heavily concentrated 
around 1, indicating that s 2 is becoming an increasingly accurate estimator 

Of <7 2 . 

With this deduction of how the estimator s 2 is distributed around its 
target a 2 , we may now infer a 95% confidence interval for a 2 using our now- 
familiar technique. We illustrate with sample size n = 11 (d.f. = 10). From 

8 C 2 is comprised of the constant parameter a 2 , and the variable s 2 . Thus it has the same 
degrees of freedom as s 2 [explained in the footnote to equation (8-11)]. 



Figure 8-5, or more precisely from Table VI, we find the critical points 
cutting off 2*% of the distribution in each tail; thus 



Solving for a 
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Pr (.325 < ^ < 2.05^ =95% 
, we obtain the equivalent statement 

Pr (^ <ff2< ^)= 95% 



(8-24) 



(8-25) 



If the observbd value of s* turns out to be 3.6, then the 95 % confidence 



interval for 



We note that 



is 



1.76 < <x 2 < 11.1 



(8-26) 



this is another example of an asymmetrical confidence interval. 
In general, the upper and lower critical values of C 2 are denoted C 2 023 
' the 95 % confidence interval is written 



and C% 7 . and 



PROBLEMS 



s* „ i 2 

a~ < a < a~ 

^.025 ^.975 



(8-27) 



*8-22 If a sample of 25 IQ scores from a certain population has s 2 = 120, 

construct a 95% confidence interval for the population <x 2 . 
*8-23 From the sample of Problem 8-6, construct a 95% confidence interval 



for oK 



Review Problems 

I! 

8-24 Two machines are used to produce the same good. In 400 articles 
produced by machine A, 16 were substandard. In the same length of 
time, the second machine produced 600 articles, and 60 were sub- 
standar i. Construct 95% confidence intervals for 

the true proportion of substandard articles from the first 



(a) 77. 
machin^ 

(b) ir, 
machine. 

(c) The 



the true proportion of substandard articles from the second 
difference between the two proportions (tt x — tt\>). 



166 



ESTIMATION II 



8-25 To determine the effectiveness of a certain vitamin supplement, the 
following data were obtained : 

Table of weight increases (in grams) for 2 groups of 3 mice. 
Control Group Treated Group 

1/ YZ > 18 *1 ; 

1/ 14. : 23 ■ 



Assume that = cr 2 , and that the mice are not paired [i.e., the 
first row of data (12 and 18) does not come from mice that are related 
by kinship, or anything else]. Construct a 95% confidence interval 
for the ''vitamin effect," ju 2 — ju lt 
8-26 Suppose a psychologist runs 6 people through a certain experiment. 
In order to find the effect on heart rate, he collects the following data : 



Heart Rate— Beats per Minute 



Person 


Before Experiment 


After Experiment 


Smith 


71 


84 


Jones 


67 


72 


Gunther 


71 


70 


Wilson 


78 


85 


Pestritto 


64 


71 


Seaforth 


70 


81 



Suppose that it is known that people as a whole have an average 
heart rate approximately normally distributed, with mean 73. 

Calculate a 95 % confidence interval for the effect of the experi- 
ment on heart rate. 
8-27 A certain scientist concluded his study in fertility control as follows: 
"So far one result has emerged from the before-and-after survey, and 
it is a key measure of the outcome: at the end of 1962, 14.2% of the 
women in the sample were pregnant, and at the end of 1963 (after 
the birth-control campaign) 11.4% of the women (in a second inde- 
pendent sample) were pregnant, a decline of about one fifth." 

If the samples (both before and after) included 2500 women, 
what statistical qualification would you add to the above statement, 
in order to make its meaning clearer? 
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Hypothesis Testing 



9-1 TESTING A SIMPLE HYPOTHESIS 

We begin with a very simple example, in order to keep the philosophical 
issues clear. 1 Suppose that I am gambling with a die, and lose whenever the 
die shows a£e. After 100 throws I notice that I have suffered an inordinate 
number of losses — 27 aces. This makes me suspect that my opponent is using 
a loaded die^; specifically, I begin to wonder whether this is one of the crooked 
dice recently advertised as giving aces one-^ujrter^of thejtoe^ 

Is my suspicion well-foundeHf^T^t I should make an accusation, 
and terminate the game? My decision should depend on several factors. 

1. How much did I trust my opponent even before I began the game 
(prior to collecting the evidence)? For example, if I am playing with a sharp- 
looking character I have just met on a Mississippi steamboat 1 will be more 
inclined to terminate the game than if I am playing with an old and trusted 
friend. j 

2. What are my potential losses involved in making a wrong decision? 
I may be playing with very attractive odds in my favor; if the die is, after 
all, a true one, then T will have a good deal to lose if I erroneously conclude 
that it is crooked and terminate the game. 

3. Doe's the evidence itself (27 aces in 100 tosses) indicate that T am being 
cheated ? 

If ques:ions (1) and (2) can be answered, even roughly, then it is useful 
to put this' whole problem into the larger framework of decision theory 
(Chapter 15). However, in many practical problems, the first two questions 
cannot easily be answered; using a medical example, what is the cost of 
making the wrong decision and certifying a drug which has serious side 
effects? In rnany instance^ it is only question (3) than can be answered by 

167 
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the scientist, and it is this limited hat extremely important question which we 
address in this chapter. 

First, state the two conflicting hypotheses as precisely (mathematically) 
as possible. The hypothesis that the die is fair is really a statement that the 
Bernoulli population of all possible throws has a proportion of aces equal to 
1/6. This is the hypothesis of "no cheating" or "nothing out of the ordinary." 
Customarily, it is called the null hypothesis, 

H 0 :tt = 1/6 = .167 (9-1) 

The other hypothesis is that the probability of an ace is 1/4; this is customarily 
called the alternate hypothesis, 

H x \7v = 1/4 = .25 (9-2) 

Suppose that even before evidence was collected — before we started 
throwing the die — the following plausible decision rule was suggested. After 
100 throws, we should: 

Accept H 0 if no more than 20 aces occur 
(i.e., if observed P < .20) 

(9-3) 

Reject H 0 (i.e., accept if more than 
20 aces occur (P > .20) 

This decision rule is shown in Figure 9 -la. The value ,20 which separates 
the two regions is often called the critical point, while P > .20 is referred to 
as the critical range, denoting observed values of P that will lead us to reject 
H 0 . When H 0 is rejected, we call the results statistically significant. 

Of course, this rule will not always lead to the right decision, because 
of chance fluctuation (bad luck). We can hope, however, that the probability 
of error is small. To find out how small we apply probability analysis, as 
in Figure 9-\b. 

First, how well will rule R work if H 0 is true? The distribution of P is 
then concentrated around its mean value .167, with only a small probability 
that an observed P will be greater than .20, causing R to give the wrong 
answer ("reject H 0 "). We now ask, "How small is the probability of this 
error ?" Recall from Chapter 6 that a sample proportion P has an approximate 
normal distribution, with 

jiip = 77 = 1 1 6 = .167 

and 

ff) = .037 (9-4) 




TESTING A SIMPLE HYPOTHESIS 
Accept H 0 Reject H 0 
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.10 



.15 .20 
(a) 



.25 



Either Ho is true, in which case: 




a = Probability of 
type I error = .18 



Or H 0 is false, (i.e., H\ is true), 
in which case: 

0 = Probability of 
type II error = .13 




FIG. 9-1 Hypothesis testing, (a) Known decision rule, R. (b) and (c) Unknown world, 

with two possibilities. 

This distribution of P is shown in Figure 9-1/?. The probability of error 
is easily calcu ated by evaluating the probability to the right of .20, that is 1 

Pr (P > .20//f 0 ) = Pr l^UiL > - 20 ~^\ 
\ <r P .037 J 



= Pr (Z > .9) 

a > ^ et us say. 



79-5) 

This error of rejecting H 0 when it is true is called a type I error, with its 
probability depoted a. 

On the otjher hand, suppose that H 0 is false; what is the probability of 
error then? In this case H 1 is true, and it = .25; the distribution of P is 
again approximately normal. According to the rule R, we will make an error 
in accepting the false H Q if we observe P < .20. The probability of this 

1 In this chapter, we write Pr ( /H 0 ) to mean "probability, assuming H 0 is true." 
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error is therefore calculated by evaluating the normal distribution in Figure 
9-1 c lying to the left of .20. Since the mean and standard deviation of this 
distribution are 

Hp = 7T = 25 (9-6) 

c P = H l ~ 70 = .043 (9-7) 

V n 

it follows that: 

Pr (P < .20/ffO = Pr < >2Q ~ ' 25 > | (9-8) 
\ (T P .043 / 

= Pr(Z < -1.15) 

= .13 = /J, let us say (9-9) 

This error of accepting H Q when it is false is called a type II error, with its 
probability denoted /S. 

The terminology of testing is reviewed in Table 9-1. Note that the 
probabilities in each row must sum to 1 ; this must follow, so long as we use 
a rule (like R) which involves the decision either 2 to accept or reject H 0 . 

We now recall that our decision rule R in (9-1) was determined arbi- 
trarily. We now ask: "Is there a better decision rule, i.e., a better critical 
point for our test than P = .20?" Of course, we should like to make the 
probabilities of error (a and /3) as small as possible, but these two objectives 
conflict. This is illustrated in Figure 9-2, which is a condensed version of 



Table 9-1 Possible Errors in Hypothesis Testing 



State of the world 


Decision 






Accept H 0 


Reject H 0 


If H 0 is true 


Correct decision. 


Type I error. 




Probability = 1 — a; 


Probability = a; 




corresponds to 


also called 




"confidence level" 


"significance level" 


If H 0 is false 


Type II error. 


Correct decision. 


(Bj true) 


Probability = $ 


Probability = 1 - /?; 






also called "power" 



2 Of course other more complicated decision rules may be used. For example, the statistician 
may decide to suspend judgement if the observed P is in the region around .20 (say .18 < 
P < .22). If he observes an ambiguous P in this range, he would then undertake a second 
stage of sampling— which might yield a clear-cut decision, or might lead to further stages of 
sequential sampling. 
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0 = .25 



a = .08 




lustration of how reducing a increases p (compare with Fig. 9-1). 



Figure 9-1, except that the critical point has been moved up from .20 to .22. 
As we hope, this does reduce a; but it also increases /?. Moreover, we note 
that the only way to eliminate a is to move the critical point far enough to 
the right, but a^ we do so /? approaches 1 ; i.e., our test becomes "powerless" 
since we can njo longer reject even the most dishonest die. Similarly, it is 
easy to confirm that any attempt to reduce /? (by lowering the critical point 
below .20) will increase a. In statistics, as in economics, the problem is 
trading off conflicting objectives. 

The only way to reduce both a and is to increase the sample size. 
From equation! (9-4) it is clear that an increase in n will reduce the spread 
of P, concentrating its distribution more closely around its central value. 
Thus if n is increased from 100 to 200, we obtain the result shown in Figure 
9-3. The only difference in this test and the one shown in Figure 9-1 is the 
increase in sample size; note how it reduces both a and /?. 

These principles are illustrated with an interesting legal analogy, in a 
murder trial, the jury is being asked to decide between H 0 , the hypothesis 
that the accused is innocent, and the alternate that he is guilty. A type 
I error is committed if an innocent man is condemned (innocence is rejected), 
while a type II ^rror occurs if a guilty man is set free (innocence is accepted). 
The judge's admonition to the jury that "guilt must be proved (i.e., innocence 
rejected) beyond a reasonable doubt" means that a should be kept very 
small. There hive been many legal reforms (for example, in limiting the 
evidence that cs.n be used against an accused man) which have been designed 
to reduce a, the probability that, an innocent man will be condemned. But 
these same reforms have increased /?, the probability that a guilty man will 
evade punishment. There is no way of pushing a down to zero, and insuring 
absolutely against convicting an innocent man without letting every defendant 
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Accept Ho 



Reject Hq 




P 



0 = .05 




a = .10 



.167 
▲ 

7T0 



.20 



.25 
A 

TTl 



P 



FIG, 9-3 How a and /3 are both reduced by increasing sample size (compare with Fig. 9-1). 

go free, thus raising /? to 1 and making the trial meaningless (powerless). 
It should also be noted that historically a and /? have been both reduced by 
improved crime detection — i.e., by increased available evidence brought to 
bear on H 0 . 

Returning to the statistical problem, we conclude that (short of raising 
more funds for increasing sample size) we are left with the problem of how 
best to balance, or trade off, a and /?. Whenever possible the answer should 
take into consideration the factors mentioned at the outset of the chapter. 

1. The relative prior likelihood of the two competing hypotheses. To 
use an earlier example: if your opponent is a trusted friend, rather than a 
complete stranger, your greater prior confidence in H d will make you more 
reluctant to reject it; thus you will keep a small. 

2. The relative cost of making each type of error. To use the same 
example: suppose the cost of making a type I error (and accusing an old 
friend of being a cheat) is high, while the cost of making a type II error is 
relatively low (you continue to bet against a crooked die — but it is only for 
peanuts); in these circumstances, your greater concern about making a type I 
error will lead you to reduce x to a relatively small value, even though /? is 
increased as a consequence. Or, drawing on our legal analogy, we may in- 
terpret legal reforms designed to protect the innocent (i.e., reduce a) as a 
reflection of the judgement that the cost of type I errors (condemning 
innocent men) exceeds the cost of type II errors (allowing the guilty to go 
free). 
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The difficulty is that, in a great deal of scientific inquiry, these questions 
cannot be answered with any precision; but because type I errors are usually 
quite serious] a is set at a small value — usually 5 % or 1 %. Then the test rule 
is constructed on this basis. 

We illustrate with our die-tossing example how hypotheses are typically 
tested. Three steps are involved. 

1. The rull hypothesis H 0 and the alternative H x are formally stated, 
as in (9-1) and (9-2). At the same time, the sample size (e.g., 100) and the 
significance of the test (e.g., a = 5%) are set. 

2. We now assume that the null hypothesis H 0 is true. And we ask: 
"What can we expect of a sample drawn from this kind of world T" This 
question is answered in our die-tossing example thus: if H 0 is true, then there 
is a probability of only 5 % that we will observe a sample P greater than .228. 
This critical jvalue (.228) is determined as follows. We note from Appendix 
Table IV that a Z value of 1.64 cuts a 5% tail off the standard normal 
distribution. This critical Z value is translated into a P value: 



which yields 



P - 



7r(l — Tt) 

n 



and for the tl 0 value of tt = 1/6: 

P - 1/6 



= Z = 1.64 



(9-10) 



(1/6) (5 /6) 

V ioo 



= 1.64 



(9-11) 



the critical value of P 



.228. The resulting test R* is shown 
in Figure 9-4. This shows us what to expect of a sample P, if H 0 is true. (At 
this stage, the probability of a type II error (/?) and the power of this test 
(1 — /S) ma| also be calculated, but this is not always done. As an exercise, 
the reader sjhould confirm that fi = .31 and the power of the test. is .69.) 
With the rule R% now established, there remains only the last automatic step. 

3. The sample is taken, and P observed. We now ask: "Is this P con- 
sistent with 7/ 0 ?" If it is not (i.e., P > .228) we reject H Q . 

As an example, recall that in our 100 tosses, we rolled 27 aces. This 
observed P p .27 is in such conflict with H 0 that it cannot be "reasonably" 
attributed to chance and // 0 is rejected. 

Summary* Whereas in our first test (R) we arbitrarily specified the 
critical value (.20) and solved for a, in this more typical hypothesis test 
(R#) we specify a (.05) and solve for the critical value. Note that the "95 % 
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■a = .05 



P 



Critical value 



which yields our decision rulei?* 



Accept Hq 



Reject H 0 




FIG. 9-4 Construction of a test of // 0> the hypothesis that tt = .167, at a 5% significance 
level (a = .05) with a sample n = 100. 

confidence level" of this test is similar to the concept of 95 % confidence 
used in interval estimation. We set up the test in Figure 9-4 on the assumption 
that H 0 is true. If this is so, there is a 95% probability that P will be observed 
below .228, and we will (correctly) accept H 0 . Thus we are using a method 
in which there is a 95% probability that we will be right when H 0 is true. 

There is another way of looking at this testing procedure. If we get an 
observed P exceeding .228, there are two explanations. 

1. H 0 is true, but we have been exceedingly unlucky and got a very 
improbable sample P. (We're born to be losers; even when we bet with odds 
of 19 to 1 in our favor, we still lose); or 

2. H 0 is not true after all; the die is crooked, and it is no surprise that 
we rolled so many aces. 

Being reasonable, we opt for the second explanation. Although the 
first explanation is conceivable, it is not as plausible as the second. But we 
are left in some doubt; it. is just possible that the first explanation is the correct 
one. For this reason we qualify our conclusion "to be at the 5 % significance 
level (type I error level)." 



PROBLEMS 



9-1 Fill in the blanks. 

Consider the problem facing a radar operator whose job is to detect 
enemy aircraft. When something irregular appears on the screen, he 
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must decide between 



all is well ; only a bit of interference on the screen, 
an attack is coming. 



In this cast , the type error is a "false alarm," and the type 

error is a j "missed alarm." To reduce both a and ft, the electronic 
equipment is made as reliable and sensitive as possible. 
9-2 (a) To test whether a die has a fair number of aces, using n — 100 
construct 3 test, at the 1 % significance level, of 

/f 0 :7r = .167 
versus H 1 :7r — .25 

(b) What is j3? 
9-3 (a) Construct the appropriate test of 

H 0 : coin unbiased, versus the alternative i/ x :Pr (heads) = .60 
using -a 25% level of significance. Assume a sample of 100 
tossesj 

(b) Do the sample results you observed in Problem 3-2 lead you to 
reject H 0 1 About how many students in your class will mistakenly 
reject H 0 1 ■ 

(c) What is* |8 for this test ? Interpret. 
9-4 (a) To test whether a die has a fair number of aces, using n =100 

construct an appropriate test, at the 5% significance level, of 

H 0 : it = .167 
versus H x \ n = .300 

(b) What a|e a and /? for this test? 

(c) Compared with the test developed in the text in Figure 9-4, is 

(1) The: critical value different? 

(2) a different? 

(3) /? different? 



9-2 COMPOSITE HYPOTHESES 



(a) Introduction 

In our die-tossing example we have assumed that there is only one way 
in which a die can be crooked (i.e., 77 = 1/4). Thus the alternative hypothesis 
H x was a simplejone. But usually there is no way of knowing how heavily 
the die may be biased against us. Thus our alternative hypothesis H x (crooked 
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die) would be a composite hypothesis, embracing a whole set of possibilities. 

H^ir = .17 



77- = .18 

7T = .19 (9-12) 



including our previous, simple alternative: 

7T = .25 



To summarize, we wish to test 

H 0 :tt = A61 (9-13) 

against the composite alternative 

H 1 :tt> .167 (9-14) 

Since there are many alternatives included in H u we can no longer evaluate 
/? as simply as in the previous section. But note that H 0 is still a simple 
hypothesis; thus the evaluation of y. has not been complicated. There is now, 
therefore, an even stronger case for concentrating on a, which we set at 
.05; we shall return to an evaluation of the more complicated /? values later. 

With this significance level of .05 given, the reader should now develop 
as an exercise the appropriate decision rule for accepting or rejecting H 0 . 
Note that this is identical to rule developed in Figure 9-4. Since the rule 
is based on the level of significance selected, (a = .05 in both cases), it is 
entirely independent of any considerations of /3. But while the formal test 
may remain the same, there are two major changes in its interpretation. 



(b) The Power Function 

With a simple alternative H u ft was a single probability value. With 
a composite H t there are now many possible values of tt, each giving a 
different /?. We show three such calculations in Figure 9-5; each involves 
evaluating the area under a curve, lying to the left of the critical value 
(P = .228). Thus the middle curve shows how the sample P will be distributed 
if the true tt is .25 and yields /? = 31 %. To interpret: if tt is in fact .25, then 
there is a probability of 31 % that an observed P will be less than our critical 




FIG. 9- 



Hv.it>. 167 

Calculation of the probability of type IT error, (n = 100). 



value of .228— j-and we will erroneously conclude that the die is fair (accept 
H 0 ). 

Table 9-2 has been constructed using a whole set of possible values of 
7r; the corresponding fi values are shown in column 2, and the power of the 
test (1 — jS) is sjhown in column 3 ; this is the probability that we will correctly 
reject a crooked die. If a dishonest gambler uses a die which always turns up 
an ace, he knojws he will be quickly found out, and the game abandoned; 
the more crookjed the die he uses, the greater your "power" to uncover him 
as a cheat. The less crooked the die, (i.e., as we move down this column), 
the more difficult it becomes to reject it. The dishonest gambler will recognize 
this, and will prefer to get you to play against a slightly crooked die. The 
power of this tjest is thus seen to be our ability to uncover a crooked die; 
and if it is only slightly biased, our test has little power, and it becomes 
almost impossi Die to distinguish between the two conflicting hypotheses. 
This is confirmed from the last, limiting line in Table 9-2. Here the value of 
77 is // 0 , and to reject H 0 would be wrong. The probability of this was a = 5 % 
by definition. 

The "power function" is graphed in Figure 9-6. Clearly we should like a 
power function that begins very close to the baseline, since its initial height is 
a, the level of significance, which we wish to keep low. At the same time, we 
wish the power function to be very steep; the more rapidly it rises, the 
greater is our power to distinguish between competing hypotheses. 
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Table 9-2 p and Power Function for the Test R* 
(Test of Fair Die, at 5% Level of Significance) 



(1) 


(2) 


(3) 




Probability of (Erroneously) 


Probability of Correctly 


Possible Values 


Accepting H 0 


Rejecting H 0 


Of 77 


P 


Power = 1 - p 


.32 


.02 


.98 


.30 


.05 


.95 


.28 


.12 


.88 


.26 


.23 


.77 


.24 


.39 


.61 


.22 


.58 


.42 


.20 


.76 


.26 


.18 


.89 


.11 


.17 


.94 


.06 


Limit (.167) 


(.95) 


(.05) = a 



(c) A Warning About Accepting H 0 

This introduces a second reason for interpreting our test in this section 
(with a composite H x ) differently from our test in the previous section (with 
a simple H{). It is now possible that this die is only slightly biased {it = .18). 
If this is in fact the case, it is very likely (/? = .89) that we will observe P in 
the range below .228. R% tells us to accept 7/ 0 (true die); but this is a mistake. 
At the same time the evidence is not strong enough to reject H 0 . What to do? 



Power = 1-P 




.167 .20 .30 

=Hq Possible values of T 



FIG. 9-6 Graph of the power function of Table 9-2, for the test of the fair die at the 

5% significance level. 
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The only other apparent option is to suspend judgement; or formally "not 
reject H 0 " It is only in this way that we can avoid the great risk of incurring 
a type II error. Earlier, with a simple H x (substantially different from H 0 ) f 
we could livd with the risk of type II error; but with our composite alternative 
this risk becomes prohibitive (/? can run up to .95). Thus we prefer to "not 
reject H 0 " 'suspend judgement," or conclude that "our sample P is not 
statistically significant, i.e., P is not significantly greater than 1/6." But we 
do not accent H 0 outright. 

We confirm this from another point of view. Suppose we are using test 
rule on ajbiased die (u = .21); we toss the die and observe P = .21. This 
would not be ; an unlucky result; in fact it is the best luck that we could have 
hoped for, sijnce our estimate P is exactly on the true value -rr. If we started 
out suspecting bias, it has now been confirmed by our sample. There are no 
grounds whatsoever for concluding that this is a fair die. We cannot accept H 0 . 
Since we alsc> cannot reject H 09 we suspend judgement. 3 

There is 'another alternative which is generally even more attractive and 
to this we shall now turn. 



(d) Prob-value 

The prob-value 4 is defined as 



/ the sample value would be as extreme \ 
Prob-value - Pr | , . „ , 

as the value we actually observed/^T 0 / 



(9-15) 



3 This section illustrates a problem involved in accepting H Q if the sample size is small; 
note how an increase in sample size, by reducing our standard error, would eventually 
allow us to rejectjjrYQ — presuming, of course, that we continued to observe P = .21. 

On the other hand, if the sample size is extremely large we can fall into a trap in 
rejecting H 0 . To see why, we consider more carefully the question of whether any die is 
absolutely true, with rr = 1/6 exactly. The answer to this must be no; H 0 :tt = 1/6 like any 
other simple hypothesis, must be (slightly) false. And all we have to do to reject it is to take 
a large enough safViple thus reducing our standard error of estimate to the point where even 
an observed P just slightly different from 1/6 will call for rejection of H 0 , i.e., be "statisti- 
cally significant."! But concluding that this die is dishonest misses the point: it is not 
dishonest enough to be of any practical consequence. We therefore must distinguish 
between statistical significance, and practical importance. 

In conclusion, sample size is obviously an important consideration in hypothesis 
testing. If the sample is very large, rejecting H 0 may be very dangerous; on the other hand, 
if the sample is very small, accepting H 0 is dangerous (and this is the more critical problem 
for economists ana other social scientists, with their limited available information). 

4 Short for "probapility-value." It is sometimes further contracted to "/>-value" ; we do not 
use this term, to avoid confusion. ; ! 
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For the gambling example, in tossing a die 100 times and observing the 
proportion of aces to be P — .27, we have 



This calculation is very similar to the calculation of a, and is shown in Figure 
9-7a. We further note that if the observed value of P is extreme, the prob- 
value is very small. Thus the prob-value measures the credibility of H 0 , It is an 
excellent way for the scientist to summarize what the data says about the 
null hypothesis. 

The relation of prob-value to testing H 0 may be seen in Figure 9-76. 



prob-value = Pr (P > .27/// 0 ) 



(9-16) 




= Pr (Z > 2.77) 



= .0028 



(9-17) 




P 



.167 



.27 

= Observed P 



(a) 




P 



.167 .27 
▲ 

(b) 



FIG. 9-7 Prob-value for the gambling example; H$ is it ~ 1/6 and sample size is n = 100. 
(a) Calculation of prob-value when observed P = .27. (b) Fig. 9-4 repeated to show re- 
lation of prob-value to a: Reject H 0 iff prob-value < a. 



Since the 
tion region 
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•value is smaller than a, the observed value of P is in the rejec- 
the test, i.e., 



Reject H 0 iff prob-value < a 



(9-18) 



To restate this, we recall that the prob-value is a measure of the credibility 
of H 0 ; if this ^credibility sinks below a, then 7/ 0 must be rejected. 

Figure 9^7 shows yet another interpretation : the prob-value is the smallest 
possible value of a at which H 0 may be rejected. 

To conclude, a major criticism of the traditional hypothesis testing of 
Section 9-1 isjthat a is set rather arbitrarily, and the simple decision to reject 
or not reject ]H 0 does not allow the sample to "tell us" all that it might 
Prob-value is jtherefore the preferred way of stating the result of a hypothesis 
test. Then each reader can set his own level of significance a at whatever 
value he deerfis appropriate, and make his own decision to reject # 0 if the 
prob-value <Ja. [If the prob-value > a, he should suspend judgement for 
the reasons ci|ed in Section 9-2(c) above.] 

Another Example. Suppose that an auto firm has been using brake 
linings with a stopping distance of 90 feet. The firm is considering a switch to 
another type of lining, which is similar in all other respects, but alleged to 
have a shorter stopping distance. In a test run the new linings are installed 
on 64 cars; the average stopping distance is 87 feet, with a standard deviation 
of 16 feet. In your job of quality control, you are asked to evaluate whether 
or not the new lining is better. 
Let 

fi = average stopping distance for the population of new linings 
and test j 

J H 0 :fi = 90 

against the alternative 

H 1 :pt<90 

Noting that tljte observed X is 87, you calculate the prob-value, using a 
method similar to (9-16): 

prob-value = Pr (X < 87/7/ 0 ) (9-19) 

In other words., this is the probability that A" will be as extreme as the value 
you observed, i.e., 3 feet or more below the hypothetical value of 90 feet. 
Translating (9-19) into Z values, we have 

(X - V* 



prob-value 



- Pr (- 

= Pr (Z < - 
= .067 



< 



87 - 90* 



16/^/64 



_90\ 
64/ 



1.5) 



(9-20) 
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You report therefore that there is evidence that the new linings are better, 
since there is only a 6.7% probability that you would get such extreme 
test results from an equivalent product. Thus you leave the decision to the 
vice-president. If he uses a 10% significance level, he will switch to the new 
linings. But if he uses a 5% significance level, he will not switch. 

(e) How to Select H 0 

So far we have tested a simple H 0 against both a simple and composite 
H v Cases occasionally occur when both H 0 and H x are composite. As an 
example, suppose we are asking whether American men are more likely to 
vote Democratic than American women. The null hypothesis (that voting 
preference is the same) contains many simple hypotheses. 

Hq : 77 m — 77 w = -50 
= 77 w = -51 



77 M == 77- w = X, 0 < X < 1 

where 7i M and 7r w represent the proportion of men and women voting 
Democratic. 

Moreover, the alternate hypothesis is even more composite. 5 
H 1 :7T U = .51 and 7r w — .50 
7i M sss .52 and 7r w = .50 
77 M = .52 and 7r w = A9 



7r M = x and tt w = y 0 < x < 1 

0 < y <> 1 
x > y 

Additional complications are now involved, above and beyond those 
introduced when only H± was composite. Indeed it is difficult to know where 
to start. The key is to define a new population parameter 6— the difference 
between voting preferences. Specifically 

5 We refer to H 0 as one-dimensional, and H 1 as two-dimensional. 
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The null hypothesis becomes 



H 0 :d = 0 



against the composite alternative 



H x \6 > 0 



(9-21) 



(9-22) 



For large samples, a test can now be constructed on the basis of the difference 
in the sample-proportions, P M — i\ v . 

As this illustration makes clear, the null hypothesis may sometimes be 
uninteresting,! and one that we neither believe nor wish to establish. It is 
selected becaijse of its simplicity. It is the alternative H x that we are trying to 
establish, and we prove H x by rejecting H 0 . We can see now why statistics is 
sometimes called "the science of disproof.'* H 0 cannot be proven; 6 and H 1 
can be proven: only by disproving (rejecting) H 0 . It follows that if we wish to 
prove some proposition, we call it H x and set up the contrary hypothesis H 0 



as the "straw 



man" we hope to destroy. 



Another Example. Suppose that the research engineers in an electronics 
company claim that they have developed a new television tube superior to the 
old, which ha(jl an average lifetime of 12,400 hours. They ask you to prove 
its superiority. You wish to establish, 

H 1 :/u> 12,400 

where ju is the average lifetime of all new tubes. The "straw man" you hope 
to destroy is that this tube is no better, i.e., 

H 0 :(A = 12,400 

The new tube is then tested in the hope that the observed sample mean will be 
significantly greater than 12,400. If it is, then H 0 is rejected and H ± is estab- 
lished. 

This example emphasizes our earlier warning against accepting J7 0 . 
Suppose the sample mean X is slightly above // 0 , yielding a prob-value of 
20%. If the vie 2 president specifies the significance level a at 5%, the evidence 
is not strong enough to allow us to reject H Q . But we cannot accept H 0 either, 
for two reason^: (1) we did not believe it in the first place; it was set up simply 
as a "straw nu.n" we hoped to knock over in order to establish H x \ (2) the 
tests suggest H* is wrong, (although not as strongly as we would have liked). 
We therefore o 3t to withhold judgement, simply quoting the prob-value, and 
wait for further evidence. 



6 See Section 9-2 (c) above. 



184 



HYPOTHESIS TESTING 



PROBLEMS 

9-5 A certain type of seed has always grown to a mean height of 8.5 inches. 
A sample of 49 seeds grown under new conditions has a mean height 
of 8.8 inches and a standard deviation of 1 inch. 

(a) At the 5% significance level, test the hypothesis that the new 
conditions grow no better plants. 

(b) Graph the power function of this test. 

9-6 Whereas the power function of our die test involved graphing all the 
(1 — j3) values in column 3 of Table 9-2, an "operating characteristics 
curve" (OCC) is defined by graphing all /? values (column 2). Draw 
the operating characteristics curve for this test, and compare it with 
the power function in Figure 9-6. What is the most desirable shape for 
an OCC? 

9-7 A man makes the implausible claim that the average yearly salary of 
men in a certain profession is only S6600. A random sample of 150 
men in that profession shows a mean salary of S6730 with a standard 
deviation of $900. 

(a) Calculate the prob-value, and interpret. 

(b) At a 5 % level of significance would you reject the man's claim? 

(c) At a 1 % level of significance, would you reject his claim? 
Would you therefore accept it? Explain your answer. 

9-8 A coffee shop sells on the average 320 cups of coffee per day, with a 
standard deviation of 40. After advertising, they find that on 7 days 
they sell an average of 350 cups, 

(a) Has advertising left their business unchanged ? Calculate the prob- 
value. 

(b) If the owner of the coffee shop specifies that the type I error of 
the test (significance level) is to be 5%, do you reject the hypothesis 
that business is unchanged? 

(c) What assumptions have you made implicitly in parts (a) and (b)? 
Under what conditions are they questionable? 

*(d) If coffee sales can be observed for 25 days, what would the average 
sales have to be in order to justify a statement that business had 
improved, at the 5% significance level? 
9-9 In order to compare the yearly incomes in two professions a survey 
was made among 100 men in each. In one sample the mean income is 
S6000 with a standard deviation of $700; in the second sample the 
^ mean is $6200 with a standard deviation of $400. To weigh the claim 
that the mean salary in the second profession is no larger than in the 
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first profession, calculate the prob-value. (Hint, Use the theory on dif- 
ference^ in means developed in Chapter 8.) 
(9-10) Records show that in a random sample of 100 hours a machine 
produced an hourly average of 678 articles with a standard deviation 
of 25. After a safety device was installed, in a random sample of 500 
hours tjhe machine produced an hourly average of 674 articles with 
a standard deviation of 5. Pointing to the drop of 4 articles per hour 
in the jsample mean, management claimed the safety device was 
reducing production. The union countered that the drop of 4 articles 
was "merely statistical fluctuation." 

(a) To 'objectively summarize the evidence on whether production is 
left unchanged, calculate the prob-value. 

(b) If the arbitration board decides that a = 1 % is a fair level of 
significance (type I error), do they rule in favor of management or 
union?) 

*9-ll At a f>% significance level, test the hypothesis that the following 
sample is drawn from a population of. random digits. 

I 2 1 2 0 8 e 6 1 2 4 5 J 2 4 8 4 4 3 0 2 

Suppose 'that the alternate hypothesis is that there is a bias towards 
small digits, 

*9-12 The output of all machines in a factory is substandard 4% of the time. 
A macnine suspected of being inferior produces X substandard 
articles Jn 400. How small would X have to be in order to reject the 
machine as inferior at the 5% significance level? 



1 

9-3 TWO-SIlj>ED TESTS 



In the previous section we asked whether men voters were more heavily 
Democratic th<p women. Suppose instead we ask whether men voters are 
more or less Democratic than women. In either case we use the same simple 
null hypothesis 1 H o :d = 0 (9-23) 

But where in (? 



we now must u 



We reject H 0 if 
than 0. We test 



22) we used a one-sided alternative 

H^d > 0 
>e the two-sided alternative 



H x \6 7^ 0, which is equivalent to 



5 > 0, or 

6 < 0 



(9-24) 



our sample estimate of d is significantly greater than, or less 



i/ 0 "from both sides." 
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As a second illustration, suppose we are again testing the trueness of a 
die. But instead of testing a suspect die we are betting against, suppose we 
work in quality control in a die-making factory. We are now just as con- 
cerned about a die that shows too few aces, as one which shows too many. 
Our appropriate test involves: 



H 0 :tt = .167 
against the two-sided alternative 



H 1 \tt ?± .167, i.e., 



77 > .167, or 
77 < .167 



(9-25) 
(9-13) repeated 



(9-26) 



[compare with (9-14)]. The critical region (for rejecting H 0 ) must now also be 
two-sided. For a level of significance a = 5% this is shown in Figure 9-86; 




Reject Ho 



(a) 



(b) 



a=5% but area in 
" each tail = 2H% 




FIG. 9-8 A one-sided and two-sided test of a die compared, (a) A one-sided test of 
H 0 :tt = .167 against the alternative H x \tt > .167 (Fig. 9-4 repeated), (b) A two-sided test 
of Hq-.tt = .167 against the alternative H^rr ^ .167. 
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an equal area (2J %) is cut off each tail in order to keep the critical region for 
rejecting H 0 as 1 large as possible. Thus: 



Reject H 0 if \Z\ = 



> 1.96 



(9-27) 



where tt 0 is .167, the null hypothesis value of tt. Equation (9-27) simply asks 
whether P diners from rr 0 by a critical amount — on either the high side or 
the low side. The final question is "How do we recognize when to use a two- 
tailed test orja one-tailed test?" The one-tailed test is recognized by an 
asymmetrical phrase like "more than, less than, at least, no more than, better, 
worse, ..." and so on. Thus our first test of whether the probability of an 
ace on the gambler's die was more than one sixth, required a one-sided test. 



PROBLEMS 



9-13 Test H^.tt = 1/2 versus H x \tt ^ 1/2 where tt is the probability of 
tossing a thumbtack "point up." Use a 5% level of significance, and 
use the ^ample observations of Problem 3-1. 

(a) Aftej- 10 tosses. 

(b) Affcjr 100 tosses. 

9-14 Referring to Problem 9-7, suppose that the man's claim that /u 0 = 6600 
is no logger implausibly low, i.e., suppose the alternate hypothesis is 
two-sid<*d: H x :ju> 6600 or ju < 6600. 

Using now a two-sided test of H 0 , and also a two-sided prob-value, 
answer the same questions as in Problem 9-7. 



9-4 the Relation of hypothesis tests to 
confidence intervals 

(a) Two-sided Hypothesis Tests 

In this section we shall reach a very important conclusion: a confidence 
interval can be used to test any hypothesis; in fact, the two procedures are 
equivalent. We illustrate with an example. 

Suppose a firm has been producing a light bulb with an average life of 
800 hours. l|t wishes to test a new bulb. A sample of 25 new bulbs has an 
of 810 hours (X), with a standard deviation of 30 hours (s). 



average life 
Noting that 



because of our small sample we should use the t, rather than the 



4 



188 HYPOTHESIS TESTING 

normal distribution, we can either 
1. Test the hypothesis 

H 0 :/u 0 = 800 

against the alternative 

H x \fA ^ 800 

H 0 may be accepted 7 at the 5% level of significance if 



jobserved t\ = 



sjyjn 



<t 



025 



(9-30) 
(9-31) 

(9-32) 



i.e., if ju 0 - 2.06 s/y/n < X <, /u 0 + 2.06 s/\/n. 

Given our sample s, along with our hypothesis ju 0 , this condition becomes 

788 < X < 812 (9-33) 

Since our observed X (810) does fall within this interval, ju 0 is acceptable. 
This is shown in Figure 9-9a. 



Accept ju Q 



788 



800 



(a) 



812 



-J 



Hypothetical Observed 
value value 
M 0 X 



Confidence interval 



798 



810 



822 



*.025 



= 2.06 



30 



V25 
(b) 



= 12 



FIG. 9-9 Comparison of two-sided hypothesis test with confidence interval (using a 
sample with X = 810 and s = 30). {a) Test of Z/ 0 ;/* 0 = 800 versus H^.p 5^ 800. (b) 

Confidence interval for 

7 More specifically k 'i/ 0 should not be rejected. " To simplify the exposition in this section 
and avoid double negatives, we shall use "accept // 0> " rather than "do not reject /f 0 " — 
although as we have pointed out earlier, the latter (weaker) conclusion may be the only 
one justified. 
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2. Alternatively, the sample result could be used to construct a confidence 
interval for m. Using the same 95% level of confidence, this confidence 
interval is defined in (8-15) as: 



or 



This is show 



V" 
30 

810 ± 2.06^ 

. V 25 

798 < (j, < 822 



(9-34) 



(9-35) 

v:\ in Figure 9-96. 

The obspved X of 810 falls in the acceptable region defined in the 
hypothesis tejst in Figure 9-9a; hence ju 0 is acceptable. At the same time, in 
Figure 9-96 vye note that // 0 falls within the confidence interval. 

This is trie key point: if and only if fi 0 falls within this confidence interval, 
will it be an acceptable hypothesis. This is clear from the diagram, since the 
interval we use is the same length in both cases: it is constructed by adding 
and subtracting precisely the same error allowance (t 025 s/yjn = 12). 
Provided the sample mean X and ju 0 differ by less than this, ja 0 will fall in the 
confidence interval, and will also be an acceptable hypothesis. This holds for 
any ju 0 . (To CDnfirm, note that ju 0 = 797.6 would be just barely contained in 
the confidence interval at the bottom; at the same time this hypothetical 
value would sliift our acceptable region to the left in the top diagram to the 
point where our sample X = 810 would just barely remain in that region. 
But any smajler hypothetical value of f.i will fall outside our confidence 
region and be rejected.) 

It can be;proven, in general, that 8 



H 0 is accepted if and only if the relevant confidence 
interval contains H Q 



(9-36) 



8 For a general algebraic proof (rather than geometric interpretation) for (9-36), consider 
the basis of both the confidence interval and hypothesis test. (We illustrate with the normal 
test of X, but oijr remarks are equally valid for most tests.) With 95% probability, 

X - jit 



o/Vn 



< 1.96 



(9-37) 



In deciding whether to accept the null hypothesis £t 0 , we first fix /* 0 , and then see whether 
the observed ^satisfies this inequality. 

In constructing a confidence interval, we first observe X; then the values of /z which 
satisfy (9-37) form our confidence interval. will be in the confidence interval if and only 



if the hypothesis 



// 0 is accepted, for in both cases we have 
X — Mo 



< 1.96 
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noting, of course, that the level of confidence (e.g., 95%) must match the 
level of type I error (level of significance, 5 %). 



*(b) One-sided Hypothesis Tests 

Equation (9-36) remains true for a one-sided test of a hypothesis, 
provided, of course, that we use a one-sided confidence interval, as shown in 
Figure 9-10. Using the same sample result, we see that the observed X of 
810 falls in the acceptable region defined in the hypothesis test in Figure 
9-10a; hence /u Q is acceptable. At the same time, in Figure 9-106 we note that 
f/, 0 falls within the confidence interval. This illustrates once more that H 0 is 
accepted if and only if the confidence interval contains H Q . 

The reasons for one-sided hypothesis tests have been established at 
length in this chapter. These same reasons justify the use of one-sided con- 
fidence intervals too. Suppose, for example that the federal government is 
considering construction of a multipurpose dam in a river basin. Suppose 



Accept Mo 



800 



810.3 



X 



(a) 



Hypothetical Observed 
Mo X 



Confidence interval 



799.7 



810 




FIG. 9-10 Comparison of one-sided hypothesis test and confidence interval (using same 
sample result as Fig. 9-9). (a) Test of H Q :^ 0 = 800 versus H^/u > 800. (f>) Confidence 

interval for /i. 
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further that: the cost of this installation is $100 million. The problem is: 
would the bjpnefits from the project: exceed this cost? 

To get an idea of irrigation benefits, suppose we run a careful calculation 
of the operation of a random sample of 25 farmers in the river basin, and 
estimate that the net profit (per 100 acres) will increase on the average by 
$810 (with a standard deviation of $30). To simplify the exposition, we have 
used the same numbers as in Figures 9-9 and 9-10, except that A 7 and p now 
refer to the average increase in profit. 

The best point estimate of ju (average profit increase) is 810. But if we 
use this in our benefit calculations, we will take no account of its reliability; 
i.e., it may be way too high, or way too low. Now consider the alternative 
estimate of $799.7, the critical point in our one-sided confidence interval in 
Figure 9-10. We can be 95% confident that this figure understates. We don't 
know by how much, but this doesn't matter; the point is that we are almost 
certain that -this underestimates benefits. Suppose we use similar under- 
estimates of jother benefits (flood control, recreation, etc.) and that these 9 
sum to $110 million. We can now be very confident that benefits exceed costs, 
since at each stage we have consciously underestimated benefits. From a 
policy point of view this is a much stronger conclusion than that the "best 
estimate" of j benefits is $120 million, since the reliability of this estimate 
remains a mystery. (This strategy clearly has a major drawback. An under- 
statement of j^enefits may reduce the estimated benefits below cost — in which 
case we woulrj have to start all over again.) 

Thus, by "cooking the case" against our conclusion, it is strengthened. 
Economists dften apply this general philosophy in another way by selecting 
adverse assumptions in order to strengthen a policy conclusion; they may 
use one-sided confidence intervals in the future for the same reason. 



(c) The Conf dence Interval as a General Technique 

The reader may ask: "Doesn't (9-36) reduce hypothesis testing to a very 
simple adjunct of interval estimation?" In a sense this is true. Whenever a 
confidence intjerval has been constructed, it can immediately be used to test 
any null hypothesis: the hypothesis is accepted if and only if it is in the con- 
fidence interval. To emphasize this point, we can restate (9-36) in an equiva- 
lent form: 



A confidence interval may be regarded as just 
the set of acceptable hypotheses. 



(9-38) 



9 I.e., the presen; value of these accumulated benefits. Issues such as the appropriate rate 
of discount, and/or the extent to which benefits must exceed costs to justify the project are 
also important considerations; but we concentrate here on the statistical issues. 
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The next question is whether, in view of this, our study of hypothesis 
testing in this chapter has been a waste of time. Why not simply construct the 
(single) appropriate confidence interval, and use this to test any null hypothe- 
sis that anyone may suggest? There is a good deal of validity to this conclusion; 
nevertheless, our brief study of hypothesis testing has been necessary for the 
following reasons: 

1. Historically, hypothesis testing has been frequently used in physical 
and social science research. This technique must be understood to be evaluated ; 
specifically the nature of type I and type IT error and the warnings about 
accepting H 0 must be understood. 

2. Certain hypotheses have no corresponding simple confidence interval, 
and are consequently tested on their own. 

3. The calculation of a prob-value provides additional information not 
available if the hypothesis is tested from a confidence interval. 

4. Hypothesis testing plays an important role in statistical decision 
theory, developed in Chapter 15. 

PROBLEMS 

9-15 Three different sources claim that the average income in a certain 
profession is $7200, $6000, and S6400 respectively. You find from a 
sample of 16 persons in the profession that their mean salary is S6030 
and the standard deviation is S570. 

(a) At the 5 % significance level, test each of the three hypotheses, one 
at a time. 

(b) Construct a 95% confidence interval for jlc. Then test each of the 
3 hypotheses by simply noting whether it is included in the confidence 
interval. 

(9-16) A sample of 8 students made the following marks: 3, 9, 6, 6, 8, 7, 8, 9. 
Assume the population of marks is normal. At a 5% level of signifi- 
cance, which of the following hypotheses about the mean mark (/*) 
would you reject? 

(a) ju 0 = 8. 

(b) ,u 0 = 6.3. 

(c) // 0 = 4. 

(d) /* 0 = 9. 

*9-17 As in the second example of Section 9-2(e), suppose a standard 
process of manufacturing television tubes has a mean of 12,400 hours. 
The engineers have found a new process which they hope is better 
than the old standard. To establish this, a sample of 100 tubes from a 
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new process has a mean of 12,760 hours, and a standard deviation of 
4000jhours, 

(a) Construct a one-sided confidence interval for the new /u. 

(b) Calculate the prob-value associated with the null hypothesis of 
no improvement. 

(c) At the 5% level of significance, do you reject the null hypothesis? 



9-5 CONCLUSIONS 



Hypothesis testing is a technique that must be used with great care, 
for several reasons. First, the construction of a confidence interval is usually 
preferred Xo an hypothesis test; the interval gives a clearer picture of the 
observed sample result, whereas a test merely indicates whether or not the 
sample is statistically significant. 

Second, there are real problems — especially with a small sample — in 
accepting ar| implausible H 0 ; instead, the prob-value of the test should be 
calculated, this provides a clear and immediate picture of how well the 
statistical results match 7/ 0 , leaving the rejection decision to the reader. 

Finally, rejection of H Q does not answer the question "Is there any 
practical economic (as opposed to statistically significant) difference between 
our sample result and H 0 V This is the broader question of decision theory, 
developed ir Chapter 15, 



Review Problems 

9-18 Four coins are tossed together 144 times. The average number of 
heads is 2.2. To answer a gambler who fears the coins are biased 
towards heads, calculate the prob-value associated with the null 
hypothesis of fair coins. 

9-19 A saijnple of 784 men and 820 women in 1962 showed that 30 percent 
of the men and 22 percent of the women stated they were against the 
John-Birch Society, The majority had no opinion. 

(a) Letting tt m and ?r w be the population proportion of men and 
womdn respectively who are against the Society, construct a 95% 
confidence interval for the difference (7r M — 7r w ). 

(b) What is the prob-value for the null hypothesis that (7t m — n w ) = 
0? 

(c) At the 5% significance level, is the difference between men and 
women statistically significant ? (i.e. , do you reject the null hypothesis) ? 

(d) Would you judge this difference to be of sociological significance? 
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(9-20) Of 400 randomly selected townspeople in a certain city, 184 favored 
a certain presidential candidate. Of 100 randomly selected students in 
the same city, 40 favored the candidate, 

(a) To judge whether the student population and town population 
have the same proportion favoring the candidate, calculate the prob- 
value. 

(b) Is the difference in the students and townspeople statistically 
significant, at the 5% level? 

9-21 To complete a certain task a sample of 100 workers in one plant took 
an average of 12 minutes, and a standard deviation of 2.5 minutes. 
A sample of 100 workers in a second plant took an average of 11 
minutes, and a standard deviation of 2.1 minutes. 

(a) Construct a 95% confidence interval for the difference in the two 
population means. 

(b) Calculate the prob-value for the null hypothesis that the two 
population means are the same. 

(c) Is the difference in the two sample means statistically significant 
at the 5% level? 

9-22 By talking to a random sample of 50 students, suppose you find that 
27 percent support a certain candidate for student government. To 
what extent does this invalidate the claim that only 20% of all the 
students support the candidate? 
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10-1 INTRODUCTION 



i 



In the last three chapters we have made inferences about one population 
mean; moreover, in Section 8-1 we extended this to the difference in two 
population jneans. Now we compare r means, using techniques commonly 
called analysis of variance. 1 Since the development of this technique becomes 



complicated 



and mathematical, we shall give a plausible, intuitive description 



of what is involved, rather than rigorous proofs. 



10-2 ONE-FACTOR ANALYSIS OF VARIANCE 

As an example, suppose that three machines (A, B, and C) are being 
compared. Because these machines are operated by men, and for other 
inexplicable reasons, output per hour is subject to chance fluctuation. In the 
hope of "averaging out" and thus reducing the effect of chance fluctuation, a 
random sanjple of 5 hours is obtained from each machine and set out in 
Table 10-1, along with the mean of each sample. 

Of the many questions which might be asked, the simplest are set out in 
Table 10-2. 



1 To keep the ^rgument simple, we assume (among other things) that there is an equal size 
sample (/?) drawn from each of the r populations. While such balanced samples are typical 
in the experimental sciences (such as biology and psychology), they are often impossible in 
the nonexperinental sciences (e.g., economics and sociology). While analysis of variance 
can be extended to take account of these circumstances, regression analysis (dealt with in 
Chapters 11 to 14) is an equally good — and often preferred — technique. But regardless of 
its limitations, analysis of variance is an enlightening way of introducing regression. 

195 



196 ANALYSIS OF VARIANCE 



Table 10-1 Sample Output of Three Machines 



Machine, or 
Sample Number 


Sample from Machine 


i 


Xi 


i = 1 


48.4 49.7 48.7 48.5 


47.7 


48.6 


= 2 


56.1 56.3 56.9 57.5 


55.1 


56.4 


= 3 


52.1 51,1 51.6 52.1 


51.1 


51.6 



Average X — X = 52.2 



Table 10-2 



Question 


How It Is Answered 


(a) Are the machines different ? 


Analysis of Variance Table (test of 




hypothesis) 


(b) How much are the machines 


Multiple comparisons (simultaneous 


different? 


confidence intervals) 



(a) Hypothesis Test 

The first question is "Are the machines really different?" That is, are 
the sample means X i in Table 10-1 different because of differences in the 
underlying population means (where represents the lifetime performance 
of machine 0- Or may these differences in X { be reasonably attributed to 
chance fluctuations alone? To illustrate, suppose we collect three samples 
from one machine, as shown in Table 10-3. As expected, sample statistical 
fluctuations cause small differences in sample means even though the ^'s are 



Table 10-3 Three Samples of the Output of One Machine 



Sample Number 


Sample Values 


Xi 


/ = 1 


51.7 


53.0 


52.0 


51.8 


51.0 


51.9 


= 2 


52.1 


52.3 


52.9 


53.6 


51.1 


52.4 


= 3 


52.8 


51.8 


52.3 


52.8 


51.8 


52.3 



X = 52.2 
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identical. So t le question may be rephrased , "Are the differences in X of Table 
10-1 of the same order as those of Table 10-3 (and thus attributable to chance 
fluctuation), or are they large enough to indicate a difference in the under- 
lying ju'sT latter explanation seems more plausible; but how do we 
develop a formal test? 

As before, the hypothesis of "no difference" in the population means 



ull hypothesis, 

H*>Pi =^2 = ^3 (10-1) 
lypo thesis is that some (but not necessarily all) of the ja's are 



becomes the r 

The alternate 
different, 

H x \jx i ^ jx j for some / and j (10-2) 

To develop a plausible test of this hypothesis we first require a numerical 
measure of th^ degree to which the sample means differ. We therefore take 
the three sample means in the last column of Table 10-1 and calculate their 
variance. Usinjg formula (2-6) (and being very careful to note that we are 
calculating the variance of the sample means and not the variance of all 
values in the table), we have 



4 = 



i 



HZ - x? 



(r - 1) 

= i S [(48.6 - 52.2) 2 + (56.4 



52.2) 2 + (51.6 - 52.2) a ] 



= 15j.5 (10-3) 
where r = number of rows (i.e., the number of sample means), and 



Yet s% does 



X = average X = ~ X X t = 52.2 



(10-4) 



not tell the whole story; for example, consider the data of 
Table 10-4, which has the same s 2 x as Table 10-1, yet more erratic machines 
that produce large chance fluctuations within each row. The implications of 



able 10-4 Samples of the Production of Three 
Different Machines 



Machine; 



Sample Output from Machine /' 




X = 52.2 
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this are shown in Figure 10-1. In Figure 10- la, the machines are so erratic 
that all sample outputs could he drawn from the same population — i.e., the 
differences in sample means may be explained by chance. On the other hand, 
the (same) differences in sample means can hardly be explained by chance in 
Figure 10-16, because the machines in this case are not erratic. 

We now have our standard of comparison. In Figure 10- 1(b) we conclude 
the fi's are different — and reject H 0 — because the variance in sample means 
($ 2 x) is large relative to the chance fluctuation. 

How can we measure this chance fluctuation? Intuitively, we seem to be 
interpreting it as the spread (or variance) of observed values within each 
sample. Thus we compute the variance within the first sample in Table 10-1, 



(n - l),ti 



2 (x u - x x f = 



n 



(48,4 - 48.6) 2 + • - • 
4 



= .52 



(10-5) 



where X v is the yth observed value in the first sample. 




i = 2 — o o q 

; _ -2 Q. Q O , QO 

~| — i — rn — i — | — l — rn — r~] — i i i i | i — r~rn — [~ 
40 50 60 

(a) 



i= 1 i = 3 i = 2 




-| — i — i — i — i — | — i — i — i — i — | — rn — i — i — j — i — m — i — p 
40 50 60 

(b) 



FIG. 10-1 (a) Graph of Table 10-4. (b) Graph of Table 10-1. The populations appear to 

be different. 
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Similarly ! we compute the variance or chance fluctuation within the 



second (s^) and third samples The simple average of these 



(n - 1) deg 
degrees of 



to s 2 J 



c 2 = 



1 ^ 2 ,52 + .87 + .25 

r i-i 3 



(10-6) 



becomes the iheasure of chance fluctuation — and is referred to as "pooled 
variance. " FrOm each of the r samples, we have a sample variance with 
of freedom, so that the pooled variance s\ has r(n — 1) 
' freedom. The key question can now be stated. Is s% large relative 



In practice, we examine the ratio 



F = 



(10-7) 



called the "variance ratio." n is introduced into the numerator so that, 
whenever H i} is true, this ratio will have, on the average, a value near 1; 
however, beca ise of statistical fluctuation, it will sometimes be above one, 
and sometimes below. 

If H 0 is nojt true (and the ^s are not the same) then ns\ will be relatively 
large comparejl to jj, and the F value in (10-7) will be greater than 1. 
Formally, H 0 is rejected if the computed value of F is significantly greater 
than 1 . j 

Before developing this test further, we interpret (10-7) from another 
point of view.- Suppose that our samples are drawn from three normal 
populations with the same variance; (in fact, these assumptions are necessary 
for the formal test below). If in addition, II 0 is true, and the three population 
means are the; same, then the division of our data into three samples is 
meaningless. Ajl observations could be viewed as one large sample drawn 
from a single population. Now consider three alternative ways of estimating 
a 2 , the varianc^ of that population. 

1. The mo it obvious way is to estimate it by computing the variance of 
the one large sample. 

2. The sec3nd way is to estimate it by averaging the variances within 
each of the 3 samples as in (10-5) and (10-6). This is the s% in the denominator 
of (10-7). 

3. Infer o% from s\, the observed variance of sample means. Recall 
from Chapter 4 how the variance of sample means is related to the variance 
of the population: 

(10-8) 
(6-12) repeated 



or 



(7 

n 



a 2 = na x - 



(10-9) 
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This suggests estimating a 1 as ns\, which is recognized as the numerator of 
(10-7). We note that we are estimating population variance by "blowing up" 
the observed variance of the sample means. 

To recapitulate: if H 0 is true, we can estimate <r 2 by three valid methods. 
Considering only the last two, we note that one appears in the numerator of 
(10-7), the other in the denominator; they should be about equal, and their 
ratio close to 1. [This establishes why n was introduced into the numerator of 
(10-7).] But if H 0 is not true, the denominator will still reflect only chance 
fluctuation, but the numerator will be a blow-up of the differences between 
means; and this ratio will consequently be large. 

The formal test of H 0 , like any other test, requires knowledge of the 
distribution of the observed statistic — in this case F — if H 0 is true. This is 
shown in Figure 10-2. The critical F 05 value, cutting off 5% of the upper tail 
of the distribution is also shown. Thus, if H 0 is true there is only a 5% proba- 
bility that we would observe an F value exceeding 3.89, and consequently 
reject 7/ 0 . It is conceivable, of course, that H 0 is true and we were very 
unlucky; but we choose the more plausible explanation that II 0 is false. 

To illustrate this procedure, let us reconsider the three sets of sample 
results shown in Tables 10-1, 10-3, and 10-4, and in each case ask whether the 
machines exhibit differences that are statistically significant. In other words, 
in each case we test H^fa = ju 2 = fx z against the alternative that they are 
not equal. For the data in Table 10-3, an evaluation of (10-7) yields: 

• : ° (10-10) 



F = 



2 



.547 



= .64 



Since this is below the critical F 05 value of 3.89, we conclude that the 
observed differences in means can reasonably be explained by chance 
fluctuations. (This is no surprise; recall that we generated these three samples 
in Table 10-3 from the same machine.) 




0 1 2 3 3.89 

FIG. 10-2 The distribution of F when H 0 is true (with 2, 12 degrees of freedom). 
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In this case 
numerator) is 



For the data in Table 10-4, the F ratio is 

77.4 



F = 



35.7 



= 2.17 



(10-11) 



;he difference between sample means (and consequently the 
much greater. But so is the chance fluctuation (reflected in a 



large denomir ator). Again, the F value is less than the critical value 3.89, 



However, for the data in Table 10-1, the F ratio is 

77 4 

F = — = 141 
.547 



(1042) 



In this case, the difference in sample means is very large relative to the chance 
fluctuation, miking the F ratio far exceed the critical value 3.89, so that H 0 
is rejected. 

These three formal tests confirm our earlier intuitive conclusions. Table 
10-1 provides the only case in which we conclude that the underlying popula- 
tions have different means. 



(b) The F Distribution 

This distribution is so important for later applications, it is worth 
considering in some detail. The ^distribution shown in Figure 10-2 is only 
one of many ; tnere is a different distribution depending on degrees of freedom 
(r — 1) in the rjumerator, and degrees of freedom [r(n — 1)] in the denomina- 
tor. Intuitively, we can see why this is so. The more degrees of freedom in 
calculating botjh numerator and denominator, the closer these two estimates 
of variance will likely be to their target cr 2 ; thus the more closely their ratio 
will concentrate around 1. This is illustrated in Figure 10-3. 

We could, present a whole set of F tables, each corresponding to a 
different combination of degrees of freedom. For purposes of practical testing, 
however, only the critical 5 % or 1 % points are required, and are set out in 
Table VII in t 
of 3.89 used in 



le Appendix. From this table, we confirm the critical point 
Figure 10-2, 



(c) The ANOVA Table 

This section is devoted to a summary shorthand of how these calculations 
are usually done. The model is summarized in Table 10-5. We confirm in 
column 2 that all samples are assumed drawn from normal populations with 
the same variaice a 2 — but, of course, means that may, or may not, differ, 
(Indeed it is the possible differences in means that are being tested). 
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FIG. 10-3 The F distribution, with various degrees of freedom in numerator and de- 
nominator. Note how the critical point (for rejecting H 0 ) moves toward 1 as degrees of 

freedom increase. 



The resulting calculations are conveniently laid out in Table 10-6, called 
an ANOVA table — an obvious shorthand for ANalysis Of VAriance. This 
is mostly a bookkeeping arrangement, with the first row showing calculations 
of the numerator of the /'ratio, and the second row the denominator; in 
(b) part of this table we evaluate the specific example of the three machines in 
Table 10-1. 

In addition, this table provides two handy intermediate checks on our 
calculations. One is on degrees of freedom in column 3. The other is on sums 



Table 10-5 Summary of Assumptions 



(1) (2) (3) 



^-v » + Population Assumed Distribution Observed Sample Values 

S . \Csi 1 N(m 19 o*) X u (j = 1 ■ ■ • n) 

V, " 2 N(^o*) X 2J (; = 1 •■•«) 

t N(, U , a 2 ) X„ (/ = 1 • • • n) 

. y~\ ^^N^: these means are not all equal 



Table 1U-6 



(a) ANOVA Table, General 



(1) 

oource oi variation 


(2) 
Variation ; 

Slum nf QnnorAC /c;c;\ 

ijum oi kiauares ^ooj 


(3) 

/-I f 

Q.I. 


(4) 
Variance; 
Mean Sum of Squares 

{ Nloo ) 


(5) 
F ratio 


Between rows; "EXPLAINED" 
by differences in X { 


- J) 2 =ss r 

i=l 


(r - 1) 


MSS r - SS r /(r - 1) 


explained variance 
unexplained variance 


Within rows ; residual 
variation, resulting from 
chance fluctuation, 
"UNEXPLAINED" 


i i (x i3 . - x t f = ss u 


r(« - 1) 


MSS U = SS u /r(/* - 1) 




Total 




(nr - 1) 






(b) ANOVA Table, for Sample Values Shown in Table 10-1 


(1) 

Source of Variation 


(2) 
Variation 


(3) 
d.f. 


(4) 
Variance 


(5) 
F ratio 


Between machines; 
"EXPLAINED" 


154.8 


2 


77.4 


77.4 

^47= 141 


Within machines; 
"UNEXPLAINED" 


6.56 


12 


.547 




Total 


161-/ 


14 V 
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of squares in column 2; the sum of squares between rows plus the sum of 
squares within rows adds up to the total sum of squares. 2 When any variation 
is divided by the appropriate degrees of freedom, the result is variance. 

The variance between rows is "explained" by the fact that the rows may 
come from different parent populations (e.g., machines perform differently). 
The variance within rows is "unexplained" because it is the random or chance 
variation that cannot be systematically explained (by differences in machines). 
Thus F is sometimes referred to as the variance ratio. 

^ _ explained variance ^ 
unexplained variance 



2 Proved as follows. Thedifference, or deviation of any observed value (X ij ) from the mean 
of all observed values (X) % can be broken down into two parts. 

Total deviation = explained deviation -f unexplained deviation 

(X a - J) ~{X i - X ) + {X u - X t ) j (10-13) 

Thus, using Table 10-1 as an example, the third observation in the second sample 
(56.9) is 4.7 greater than X = 52.2. This total deviation can be broken down into 

(56.9 - 52.2) = (56.4 - 52.2) + (56.9 - 56.4) 

4.7 = 4.2 + .5 

Thus most of this total deviation is explained by the machine (4.2), while very little (.5) is 
unexplained, due to random fluctuations. Clearly (10-13) must always be true, since the 
two occurrences of X i cancel. 

Square both sides of (10-13) and sum over all / and /: 

II w„ - *f = II - + 2 22 - - + II (x„ - *,f 

i i i i i i i j 



22 



On the right side, the middle (cross product) term is 

, which must be zero since 

fZ 

— . " " II.,- ^ 

the algebraic sum of deviations about the mean is always zero. 
Furthermore, the first term on the right side of (10-14) is: 



(10-14) 



I 



2 (X t - X? 



= "2(* ( -tf) 8 (10-15) 



independent of / 
Substituting these two conclusions back into (10-14), we have: 

22 w„ - *>■ - « I (*t - + 22 (*u - no-i6) 

i J i i j 

Total variation = explained variation + unexplained variation. 
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This suggests a possible means of strengthening this Ftest. Suppose that these 
three machines are sensitive to differences in temperature. Why not introduce 
temperature explicitly into the analysis? If some of the previously unexplained 
variation car| now be explained by temperature, the denominator of (10-17) 
will be reduced. With the larger F value that results we will have a more 
powerful test. of the machines (i.e., we will be in a stronger position to reject 
H 0 ), Thus thcj introduction of other explanations of variance will assist us in 
detecting whether one specific influence (machine) is important. This brings 
us to two-way ANOVA in Section 10-3. 



*(d) Confidence Intervals 

The difficulties with hypothesis tests cited in Chapter 9 hold true in the 
ANOVA case as well. It may not be too enlightening to ask whether popula- 
tion means differ; by increasing sample size enough, t such a difference can 
nearly always be established— even though it is too small to be of any 
practical or economic importance. Again, it may be more important to find 
out "by how much do population means differ?" 

If we wanted to compare only two machines in Table 10-1, this would 
be an easy cuestion to answer: just construct a confidence interval for 
(fx x — // 2 ) using (X x — X 2 ) and the / distribution: 



G"i ~ ft) = (X x - X 2 ) ± t, 02 r o s p - + ~ 

n n 



(10-18) 
(847) repeated 



In (8-17), si was the variance pooled from the two samples. However, 
it is more reasonable to use all the information available, and pool the 
variance from all three samples as in (10-6), obtaining ^ = .547 with 
4 + 4 + 4 = 12 degrees of freedom. Thus the 95% confidence interval is 

h ~ ft) = (48.6 - 56.4) ± 2.179/547VF+i 



Similar confidence intervals for (/^ — fi 3 ) and for (jti z — /%) may be 

true 
ip!e 

(ft ~ ft) = -7.8 ± 1.0 (a) 

(10-19) 



The results of 



(ft ~ ft) = -7.8 ± 1.0 (a) 
(ft-ft) = -3.0 ± 1.0 (b) 
ft) = +4.8 ± 1.0 (c) 
his piece-by-piece approach are summarized in Table 10-7. 
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Table 10-7 Differences in Population Means (/<<■ — /< x ) 
Estimated from Sample Means (%• — X{). 95% Level 
of Confidence in Each Interval Estimate 



i 


I 


1 


2 


3 


1 




0 


-7.8 ± 1.0 


-3.0 ± 1.0 


2 






0 


4.8 ± 1.0 


3 








0 



*(e) Simultaneous Confidence Intervals: Multiple Comparisons 

There is just one difficulty with the above approach. Although we can be 
95% confident of each individual statement [e.g., 10-19(a)], we can be far 
less confident that the whole system of statements (10-19) is true; there are 
three ways in which this could go wrong. 

The level of confidence in the system (10-19) would be reduced to 
(.95)3 — . .857, if the three individual statements were independent. But in 
fact they are not; for example, they all involve the common term s P . Thus if 
our observed s p is high, all three interval estimates in (10-19) will be wide as a 
consequence. The problem is how to allow for this dependence in order to 
obtain the correct simultaneous confidence coefficient for the whole system. 
In fact, this problem is usually stated the other way around: how much wider 
must the individual intervals in (10-19) be in order to yield a 95% level of 
confidence that all are simultaneously true? 

Of the many solutions, we quote without proof the simplest, due to 
Scheffe; 3 with 95% confidence, all the following statements 4 are true. 



(Mi - Pi) = ( *t - ± V F .oo s p {1 } 2 (a) 

V n 



0*i - IH) = (Xi - Xz) ± sJ 1 -^ 2 (b) (10-20) 



(jH - fa) - (X 2 - X 3 ) ± VF 05 s P J (jL ^- 2 (c) 

3 H. Scheffe, The Analysis of Variance, p. 66-73, New York: John Wiley, 1959. 

4 And some other statements as well — as we shall see in (10-26). In fact if we were interested 
only in the three comparisons of means in (10-20), our interval estimates could be made 
slightly narrower. 
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1) d.f.) leaving 5% 



where 

F 05 = the critical value of F (with r — 1 and r(n 
inj the upper tail. 

si — the pooled sample variance, as calculated in Table 10-6 or equa- 
tion (10-6) 

r — number of rows (means) to be compared. 
n = earn sample size. 

We note the similarity of statements (10-20) and (10-19). For the 
machines in Tpble 10-1 , the actual simultaneous confidence intervals are 

^ - ^ = (48.6 - 56.4) ± 73^9 (.74)Vf(2) 

- -7.8 ± 1.3 (a) 

fa - ih = -3.0 ± 1.3 (b) (10-21) 

H - fa = 4 - 8 ± 1.3 (c) 

These caculations are summarized in Table 10-8. As expected, the 
width of the c onfidence interval is greater than in Table 10-7 (compare 1.3 
versus 1.0). Indeed, it is this increased width (vagueness) that makes us 95% 
confident thatjtf// statements are true. 

As a bonus, this theory can be used to make any number of comparisons 
of means, called "contrasts." A "contrast of means" is defined as a linear 
combination, 'or weighted sum, with weights that add to zero: 

r 

2 c iVi 



provided 



I c, = o 



Table 10-8 Differences in Population Means (^ — /Vj) 
Estimated from Sample Means (X t — Xj). 95 % Level 
of Confidence in All Interval Estimates. (Compare with 
Table 10-7.) 



(10-22) 



1 
2 
3 



0 -7.8 ± 1.3 -3.0 ± 1.3 
0 +4.8 ± 1.3 

0 
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For example, the simplest contrast is the difference 



(10-23) 



It was this contrast that was estimated in (10-21a). Another interesting 
contrast is the difference between and the average of ju 2 and /u :i : 

lh - ^ \ ^ = + + (-*V«a (10-24) 

There is no limit to the number of contrasts. It is no surprise that each 
contrast of the population means will be estimated by the same contrast of 
the sample means, plus or minus an error allowance. (10-2 la) is one example. 
As another example, the contrast of means given in (10-24) is estimated as 



K = Ul - \X* - Us) ± V f .05 *p . 



/(r - 1)3 



(10-25) 



The general statement, from which (10-20) and (10-25) were derived, is 



With 95% confidence, all 

contrasts are bracketed by the bounds: 



V r V r V a. (r ~ { K2 C & 



(10-26) 



provided only that ^ C, = 0 to satisfy the definition of "contrast." As before 
si is pooled variance, and F 05 is the critical value of F. 

When we examine (10-26) more carefully, we discover that this defines a 
set of 95% simultaneous confidence intervals which includes not only the 
three statements in (10-20) but also statements like (10-25), and indeed an 
infinite number of contrasts that can be constructed. The student may 
justifiably wonder "How can we be 95% confident of an infinite number of 
statements?" The answer is: because these statements are dependent. Thus, 
for example, once we have made the first two statements in (10-21), our 
intuition tells us that the third is likely to follow. Moreover, once these three 
statements are made, intervals like (10-25) tend to follow, and can be added 
with little damage to our level of confidence. As the number of statements or 
contrasts grows and grows, each new statement tends to become simply a 
restatement of contrasts already specified, and essentially no damage is done 
to our level of confidence. Thus, it can be mathematically confirmed that the 
entire (infinite) set of contrasts in (10-26) are all simultaneously estimated at 
a 95% level of confidence. 
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PROBLEMS 



10-1 A sample of 4 workers was drawn at random from two different 
industries, with their average annual income (in $00) recorded, as 
follows: 

Industry A 66 62 65 63 
Industry B 58 56 53 61 

(a) Using first a /-test (as in Chapter 8) and then an ANOVA jF-test, 
calculate whether or not there is a statistically significant difference 
in incorie at the 5 % level. 

(b) Are the / and F tests exactly equivalent? Can you see why the t 2 
distribution is often referred to as the F distribution with 1 degree of 
freedom in the numerator? 

*(c) Using first the t distribution (8-17), and then the F distribution 
(10-20), construct a 95% confidence interval for the difference in mean 
income^ in the two industries. 
=> 10-2 Twelve plots of land are randomly divided into 3 groups. The first 
is held ejs a control group while 2 fertilizers A and B are applied to the 
other 2 groups. Yield is observed to be : 

Control, C 60 64 65 55 
A 75 70 66 69 

B 74 78 72 68 

(a) At; a 5% significance level, does fertilizer affect yield? 
*(b) Construct a table of differences in means, similar to Table 10-8, 

starring the differences that are statistically significant. 
*(c) Cajn you be 95 % confident that the two fertilizers have a different 
effect? 

*(d) WJiat is the difference between a contrast of means, and a 
weighted average of means? 
10-3 You have observed the income (Y) of a sample of men and women 
in a certain occupation to be: 



(a) Al 



Women 
48 
56 
50 
54 



Men 
60 
70 
62 
48 



a 5% level of significance, can you reject the null hypothesis 



that mean income is the same for men and women? 
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*(b) Construct a 95 % confidence interval for the difference in the two 
means. 

Since this problem is important later in Chapter 13 we state its solution, 
(a) F x = 52 F 2 = 60 Y = 56 

ANOVA Table 



Source 


Variation 


d.f. 


variance 




Between 
sexes 


128 


1 


128 




Residual 


288 


6 


48 


Total 


416 


7 





F is less than the critical value of 5.99, thus not statistically signi- 
ficant. 

*(b) Evaluate the first equation in (10-20); or, more simply (10-18), 
noting that f. 025 = Vi\ 05 

( ih - fa) = (52 - 60) ± 2.45V48V2/4 
= -8 ± 12 

This also confirms the answer in (a); since this interval includes zero, 
this is not statistically significant. 
*10-4 Referring to the machine example of Table 10-1 and ANOVA Table 
10-6(b), use equation (10-26) to incidentally solve the following 
problem : 

Suppose one factory is to be outfitted entirely with machines of 
the first type. Suppose a second factory is to be outfitted with 
machines of the second and third types, in the proportions 30% and 
70%. Find a 95% confidence interval for the difference in mean 
production for the 2 factories. 
10-5 From each of three large classes, 50 students were sampled, with the 
following results: 

Class Average Grade X Standard Deviation, s 
A 68 11 
B 73 12 
C 70 8 

Test whether the classes are equally good at a 5% significance level. 
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10-3 TWO-FACTOR ANALYSIS OF VARIANCE 



(a) The ANOVA Table 



We have already seen that the F test on the differences in machines given 



Id be strengthened if the unexplained variance could be reduced. 



in (10-17) wou 

We suggested., for example, that if some unexplained variance is due to 
temperature, pis might be taken into account; or if some unexplained 
variance is due to the human factor, we shall see how this might be adjusted 
for. Suppose that the sample outputs given in Table 10-4 were produced by 
five different machinists — with each machinist producing one of the sample 
values on eacjh machine. This data, reorganized according to a two-way 
classification (by machine and operator), is shown in Table 10-9. It is necessary 
to complicate our notation somewhat. We are now interested in the average 



of each opera 
machine (X iti 



or (X mj9 each column average) as well as the average of each 
each row average 5 ). 



Now the picture is clarified; some operators are efficient (the first and 
fourth), some pre not. The machines are not that erratic after all; there is just 
a wide differer ce in the efficiency of the operators. If we can explicitly adjust 
for this, it will^reduce our unexplained (or chance) variation in the denomina- 
tor of (10-17);:since the numerator will remain unchanged, the F ratio will be 



appears that a 

Table 10-9 Si 
in Tab 


lother influence (difference in operators) was responsible for a 

mples of Production {X i5 ) of Three Different Machines (as given 
e 10-4, but now arranged according to machine operator) 


Machine j 


erator 


7=1 2 3 4 5 


Machine 
Average 


i « 1 
3 




56.7 45.7 48.3 54.6 37.7 
64.5 53.4 54.3 57.5 52.3 
56.7 50.6 49.5 56.5 44.7 


48.6 
56.4 
51.6 


Operator ave 


rage 


59.3 49.9 50.7 56,2 44.9 


X - 52.2 


5 The dot indica 
suppresses the su 


es the subscript over which summation occurs. For example, the dot 

^script j in X. = - T X iS . 

n ) 



1 



to 

to 



Table 10-10 Two-Way ANOVA-General 



(1) 

Source 


(2) 
Variation; 
Sum of Squares (SS) 


(3) 
d.f. 


(4) 
Variance; 
Sum of Squares (MSS) 


(5) 
F 


Between rows; EXPLAINED by 
differences in machines, i.e. 
differences in X it 

Between columns; 
EXPLAINED by differences 
in operators, i.e., differences 
in X j 

UNEXPLAINED, i.e., residual 
variation, resulting from 
chance fluctuation. 


SS r =c%(X im -Xf 
SS C =rZ(X, i -X? 

3=1 

ss » = 2 2 (*« - x t. - x .i + *) 2 


r - 1 

c - 1 

(r - l)(c - 1) 


SS r 

MSS r = r -j _ 2 

MSS C = = rs\ 
c — 1 

ss w 

MSS " = (,-0(c-i) 


MSS r 
MSS W 

MSS C 
MSS tt 


Total 


ss = 2 2 c*« - *) 2 

i=l i=l 


rc — 1 
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lot of extraneous noise in our simple one-way analysis in the previous section ; 
by removing this noise, we hope to get a much more powerful test of our 
machines. 

The analysis is an extension of the one-factor ANOVA, and is sum- 
marized in Tajble 10-10. Of course, the small letter c represents the number 
of columns in Table 10-9, and replaces n in Table 10-4. As before, the com- 
ponent sources of variation shown in column 2 sum to the total variation 
at the bottom of this column, i.e., 



1=1 3=1 



Xf = c I (Xi. - Xf + r2(JP., - X? 



Total variation 



machine (row) 
variation 



?=1 

I operator (column) 
variation 



+ 1 1(X i} - X. - X.i + Xf (10-27) 



+ random variation 



We note that <pperator variation is defined like machine variation; the only 
difference is that this is defined as the variation exhibited by column means. 
(10-27) is established by a complex set of manipulations, parallel to those 
used to establish (10-16) in the simpler case. (The last term — the random 



variation — in ( 



10-27) may seem a bit puzzling; it will be interpreted below.) 



(b) Testing Hypotheses 



With the total variation broken down into components in (10-27), we 
can now test whether there is a significant difference in machines, or whether 
there is a significant difference in operators; in either test the extraneous 
influence of the other factor will be taken into account. 

On the one hand, we test for differences in machines by constructing 



the ratio 



MSS r variance explained by machines 



MSS 3J 



unexplained variance 



(10-28) 



which, if H 0 is true, has an ^distribution. Thus, if the observed F calculated 
in (10-28) exceeds the critical F value we may reject the null hypothesis, 
concluding that there is a difference in population row means. 

Our calculations are shown in full in Table 10-11, whence (10-28) is 



evaluated as: 



77 4 

F = = 13.1 
5.9 



(10-29) 



Since this exceejds the critical 6 F value of 4.46, we reject the null hypothesis 



6 2 and 8 d.f., and 



5% significance. 
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Table 10-11 Two-Way ANOVA, for Observations Given in Table 10-9 



(1) 


(2) 


(3) 


(4) 


(5) 


(6) 




Variation; 




Variance ; 




Critical 


Source 


(SS) 


d.f. 


(MSS) 


F 


F 


Between machines 


154.8 


2 


77.4 


13.1 


4.46 


Between operators 


381.6 


4 


95.4 


16.2 


3.84 


Residual variation 


47.3 


8 


5.9 






Total 


583.7 V 


14 V 









that the machines are similar. We now compare this with our F test in (10-1 1), 
where we could not reject the null hypothesis. The numerator remains 
unchanged, but the chance variation in the denominator is much smaller, 
since the effect of differing operators has been netted out. This has given us 
greater statistical leverage, 7 allowing rejection of the null hypothesis. 

Similarly, we might test the null hypothesis that the operators perform 
equally well. Once again F is the ratio of an explained to an unexplained 
variance; but this time, of course, the numerator is the variance estimated 
from column differences. Thus 

f _ variance explained by operators _ MSS C _ 95.4 _ ^ ^ 
unexplained variance MSS W 5.9 

In this case, the "machine" noise has been isolated; as a consequence we get 
a strong test of how operators compare. Since our observed lvalue of 16.2 
exceeds the critical lvalue 8 of 3.84, we reject the null hypothesis, concluding 
that machinists do differ. 

There is one issue that we passed over quickly, that still requires clarifica- 
tion. In our one-factor test we calculated unexplained variation by looking 
at the spread of n observed values within a category, or cell, e.g., within a 
whole row in Table 10-4. But in the two-way test (Table 10-9) we have split 
our observations columnwise, as well as rowwise; this has left us with only 

7 Strictly speaking, we have a stronger test because we have gained more by reducing 
unexplained variance than we have lost because our degrees of freedom in the denominator 
have been reduced by 4. (The student will observe that if we are short of degrees of freedom— 
i.e., if we are near the top of F Table VII, loss of degrees of freedom may be serious.) 

8 Different than in the previous test since degrees of freedom are now 4 and 8. 
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one observation in each cell. Thus, for example, there is only a single observa- 
tion (57.5) of how much output is produced by operator 4 on machine 2. 
Variation car no longer be computed within that cell. What should we do? 

We ask, "If there were no random error, how would we predict the 
output of operator 4 on machine 2?" We note, informally, that this is a 
better-than-a^erage machine (X 2t = 56.4) and a relatively efficient operator 
(X A = 56.2). On both counts we would predict output to be above average. 
This strategy tan easily be formalized to predict jf 2 4 . We can do this for each 
cell, with thej random element estimated as the difference in our observed 
value (X { j) arid the corresponding predicted value {% i3 ). This yields a whole 
set of randorrj elements, whose sum of squares is precisely the unexplained 
variation 9 SS U (the last term in equation (10-27), also appearing in column 2 
of Table 10-10) ; divided by d.f., this becomes the unexplained variance used 
in the denominator of both tests in this section. 

One final' warning: in computing predicted output jf ijs we assume that 
there is no interaction between the two factors as would occur, for example, 
if certain operators like some machines, and dislike others; such interaction 
would requireja more complex model, and more sample observations. The 

9 Predicted value X ij is defined as 

= X + adjustment reflecting machine performance + adjustment (10-31) 
reflecting operator performance 

= j P + (AV - + (X, ~ X) (10-32) 

Specifically, in our example 

X 2i = 52.2 -j- (56.4 - 52.2) + (56.2 - 52.2) 
= 52.2 + 4.2 + 4.0 = 60.4 

Thus, our prediction of the performance of operator 4 on machine 2 is calculated by 
adjusting average^performance (52.2) by the degree to which this machine is above average 
(4.2) and the degree to which this operator is above average (4,0). 
Cancelling X, values in (10-32): 

X.. = x it + X tj - X (10-33) 

ement, being the difference between the observed and expected, becomes: 



and the random el 



: X^ — X^ — X, j -\- X 



(10-34) 



We emphasize that this random element is output left unexplained after adjustment for 
both machine i arid operator j. 
In our example 

*24 ~ *24 = 57 ' 5 ~ 60 4 = - 2 ' 9 (10-35) 

Thus, this observed output is 2.9 units below what we expected, and must be left unex- 
plained — the result of random influences. 

Unexplained variation (SS W ) is recognized to be the sum of squares of all random 
elements as defined in (10-34). 
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two-way analysis of variance developed in this section is based on the assump- 
tion that interaction does not exist. 



*(c) Multiple Comparisons 

Turning from hypothesis tests to confidence intervals, we may write a 
statement for two-factor ANOVA which is quite similar to (10-26): 



With 95% confidence, all contrasts in row means 
fall within the bounds: 



(10-36) 



where 

F 05 = the critical value of F, with (r — 1) and (r — l)(c — 1) d.f. 

r ■ 
c 



\/MSS M , as calculated in Table 10-10, column 4 
number of rows 
: number of columns 



Note that (10-36) differs from (10-26) because unexplained variance 
is now smaller, making the confidence interval more precise. 

As an example, consider the machines of Table 10-9, analyzed in ANOVA 
Table 10-11. With 95% confidence, all the following statements are true: 

^ - ^ = (48.6 - 56.4) ± V446 y/s3 V|(2) 



i.e. 



Ih ~ IH = -7.8 ± 4.5* 
(h - fa = -3.0 ± 4.5. 
fa- fa = 4.8 ±4.5* 
and all other possible contrasts 



(10-37) 



[Intervals that do not overlap zero are starred to indicate their statistical 
significance: thus H 0 (no difference in means) would be rejected in these 
cases — another illustration of how confidence intervals may be used to test 
hypotheses.] 

Of course, we could contrast the column means equally well, by simply 
interchanging r and c in equation (10-36). As an example, how do the 
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operators of Table 10-9 compare, when analyzed in ANOVA Table 10-11? 
With 95% confidence, all the following statements are true: 



i.e., 



and all other 



- H = (59.3 - 49.9) ± V3.84 V5.9 vf(2) 





- f*2 = 


9.4 ± 7.8* 


u ± 


- fa = 


8.6 ± 7.8* 


t h 


- fa = 


3.1 ± 7.8 


u t 




14.4 ± 7.8* 


u 2 


- jw 3 = 


-0.8 ± 7.8 



possible contrasts, of the form 



} (10-38) 



1 Cm = I C,Jf , ± 5.5 VlC/ 



For example, 



^i+^+^4 _ £»±fi = (554 _ 47A) ± 5,5^/576 

= 8.0 ± 5.0* 



This last contrast might be of interest if workers 1, 3, and 4 are men, and 
workers 2 and: 5 are women; thus the average difference in men and women 
has been estinjated, as a bonus. 

The first part of equation (10-38) — all differences in means — can be 
presented more concisely, in Table 10-12. 



Table 10-12 Differences in Operator Means ^ — juj 
[Estimated from the sample means (X j — X j). To 
cc instruct 95% simultaneous confidence intervals, take 
thp listed value ±7.8. Statistically significant differences 
are starred.] 





f 


1 


2 


3 


4 


5 




1 




9.4* 


8.6* 


3.1 


14.4* 




2 






-.8 


-6.3 


5.0 




3 








-5.5 


5.8 




4 










11.3* 




5 
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PROBLEMS 

10-6 To refine the experimental design of Problem 10-2, suppose the twelve 
plots of land are on 4 farms (3 plots on each). Moreover, you suspect 
that there may be a difference in fertility between farms. You now 
retabulate the data in Problem 10-2, according to fertilizer and farm 
as follows. 



^^Farm 
Fertilizers^ 


1 


2 


3 


4 


Control C 


60 


64 


65 


55 


A 


69 


75 


70 


66 


B 


72 


74 


78 


68 



(a) Reanalyze whether or not the fertilizers differ, at the 5% sig- 
nificance level. 

(b) Is there, after all, a difference in the fertility of the four farms? 
(Use a 5% significance level.) 

*(c) Construct a table of differences in fertilizers similar to Table 
10-12, starring differences that the statistically significant; also con- 
struct a table of differences in farms. 
(10-7) Three men work on an identical task of packing boxes. The number of 
boxes packed by each in 3 given hours is shown in the table below. 



\Man 
Hour^\^ 


A 


B 


C 


11-12 A.M. 


21 


18 


21 


1-2 P.M. 


22 


22 


25 


4-5 p.m. 


17 


16 


18 



(a) Test whether each factor is statistically significant at the 5 % level. 
*(b) For the factors which are statistically significant, construct a 
table of simultaneous 95% confidence intervals as in Table 10-12. 
10-8 Five children were tested for pulse rate before and after a certain 
television program, with the following results: 





Before 


After 


A 


96 


104 


B 


102 


112 


C 


108 


112 


D 


89 


93 


E 


85 


89 
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(a) £est whether pulse rate changes, at the 5% significance level. 
*(b) Construct a 95% confidence interval for the change in pulse rate 
for the population of all children. 
10-9 Rewojrk Problem 10-8 using the following technique (matched Mest) 
First,: tabulate the change in pulse rate: 



qefore (X) 

96 
102 
108 
89 
85 



After (Y) 

104 
112 
112 

93 
89 



Difference D = ( Y - X) 
+8 

\ I 

A sample to estimate 

A = fly - 




The simple of ZTs fluctuates around the true difference A Now apply 
equation (8-15). 



chapter II 

Introduction to Regression 



Our first example of statistical inference (in Chapter 7) was estimating 
the mean of a single population. This was followed (Chapter 8) by a compari- 
son of two population means. Finally (Chapter 10) r population means were 
compared, using analysis of variance. We now consider the question "Can 
the analysis be improved upon if the r populations do not fall in unordered 
categories, but are ranked numerically?" 

For example, it is easy to see how the analysis of variance could be used 
to examine whether wheat yield depended on 7 different kinds of fertilizer. 1 
Now we wish to consider whether yield depends on 7 different amounts of 
fertilizer; in this case, fertilizer application is defined in a numerical scale. 
If yield (Y) that follows from various fertilizer applications (X) is plotted, a 
scatter similar to Figure 1-11 might be observed. From this scatter it is clear 



100 200 



300 400 500 600 
Fertilizer (lb/acre) 



700 



FIG. 11-1 Observed relation of wheat yield to fertilizer application. 



By extending Problem 10-2. 
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that the amount of fertilizer does matter. Moreover, it should be possible to 
define how fertilizer affects yield — i.e., define an equation describing the 
dependence of Y on X. Estimating an equation is of course equivalent 
geometrically jto fitting a curve through this scatter, the so-called regression 
of Y on X. This regression will be a simple mathematical model, useful as a 
brief and precise description, or as a means of predicting the yield Y for a 
given amount of fertilizer X. Regression is the most useful of all statistical 
techniques. As another example, in economics it provides a means of defining 
how the quanfity of a good demanded depends on its price, or how consump- 
tion depends on income, 

This chapter is devoted exclusively to how a straight line may best be 
fitted. The characteristics of this line (e.g., its slope) may be subjected to 
statistical tests of significance; but these issues are deferred to Chapter 12. 
Furthermore, lit is possible that Y is related to I in a more complicated 
nonlinear way; but these issues are not dealt with here. Instead we assume 
that the appropriate description is a straight line. 



11-1 AN EXAMPLE 

Since whpt yield depends on fertilizer, it is referred to as the "dependent" 
variable Y; sihce fertilizer application is not dependent on yield, but instead 
is determined by the experimenter, it is referred to as the "independent" 
variable X. Suppose funds are available for only seven experimental observa- 
tions, so that^ the experimenter sets X at seven different values, taking only 
one observation 7 in each case, shown in Figure 11-2 and Table 11-1. 




) 100 200 300 400 500 600 700 
Fertilizer (lb/acre) 



FIG. 11. 



2 Observed wheat yields at various levels of fertilizer application. 
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Table 11-1 Experimental Data 
Relating Yield of Wheat to the 
Amount of Applied Fertilizer, as in 
Figure 11-2 



A 


i 


Fertilizer 


v:, i,i 

Yield 


(lb/acre) 


(bu/acre) 


100 


40 


200 


45 


300 


50 


400 


65 


500 


70 


600 


70 


700 


80 



We first of all note that if the points were exactly in a line, as in Figure 
ll-3a, then the fitted line could be drawn in with a ruler "by eye" perfectly 
accurately. Even if the points were nearly in a line, as in Figure 11-36, fitting 
by eye would be reasonably satisfactory. But in the highly scattered case, as 
in Figure ll-3c, fitting by eye is too subjective and too inaccurate. Further- 
more, fitting by eye requires plotting all the points first. If there were 100 




(a) 
Y 



(b) 



(c) 

FIG. 11-3 Various degrees of scatter. 
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experimental observations this would be very tedious, and an algebraic 
technique which an electronic computer could solve would be preferable. 

The following sections set forth various algebraic methods for fitting a 
line, successively more sophisticated and satisfactory. 



11-2 POSSIBLE CRITERIA FOR FITTING A LINE 

It is tim< to ask more precisely "What is a good fit?" The answer surely 
is, "a fit that makes the total error small." One typical error is shown in 
Figure 11-4. it is defined as the vertical distance from the observed Fto the 
fitted line— i.e., ( Y t — fj, where T t is the "fitted value of 7" or the ordinate 
of the line. \^e note that the error is positive when the observed Y f is above 
the line and negative when the observed Y f is below the line. 



1 . As our fi> tentativi criteri n 



the total of a 



But this crite: 



consider a fitted line which minimizes 



1 these errors, 



f-1 



(ii-i) 



ion works badly. Using this criterion, the two lines shown in 
Figure 11-5 fit the observations equally well, even though the fit in Figure 
\\-5a is intuitively a good one, and the fit in Figure 11 -5b is a very bad one. 
The problem is one of sign; in both cases positive errors just offset negative 
errors, leaving; their sum equal to zero. This criterion must be rejected, since 
it provides no distinction between bad fits and good ones. 

2. There are two ways of overcoming the sign problem. The first is to 



Fitted line 




X\ X2 Xi X 

FIG. 11-4 Error in fitting points with a line. 
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(a) 



(b) 



FIG. 11-5 



minimize the sum of the absolute values of the errors, 



(11-2) 



Since large positive errors are not allowed to offset large negative ones, 
this criterion would rule out bad fits like Figure 11-56. However, it still has a 
drawback. It is evident in Figure 11-6, that the fit in part b satisfies this 
criterion better than the fit in part a; (]T | Y t — t t \ is 3, rather than 4). In 
fact, the reader can satisfy himself that the line in part b joining the two end 
points satisfies this criterion better than any other line. But it is not a good 
common-sense solution to the problem, because it pays no attention what- 
ever to the middle point. The fit in part a is preferable because it takes 
account of all points. 
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3. As k second way to overcome the sign problem, we finally propose 
to minimize the sum of the squares of the errors, 

I 2(r f - ? t )* (11-3.) 

This is the fjamous "least squares" criterion; its justifications include: 

(a) Squaring overcomes the sign problem by making all errors positive, 

(b) Squaring emphasizes the large errors, and in trying to satisfy this 
criterion large errors are avoided if at all possible. Hence all points are 
taken ir to account, and the fit in Figure 1 l~6a is selected by this criterion 
in preference to Figure 11-66. 

(c) Thej algebra of least squares is very manageable. 

(d) There are two important theoretical justifications for least squares, 
developed in the next chapter. 



11-3 THE LEAST SQUARES SOLUTION 

Our scatter of observed X and Y values from Table 11-1 is graphed in 



Figure \ \-la 

This involves 

Step L 

variable x, sc 



Our objective is to fit a line 

Y = a 0 + bX (11-4) 

three steps: 

Translate X into deviations from its mean; i.e., define a new 
that: 



x = X 



(11-5) 



In Figure 11-7/? we show how this involves a geometric translation of axis 
similar to the procedure developed in Section 5-3, where both axes were 
translated to ^tudy covariance. The new x value becomes positive or negative 
depending onj whether X was above or below X. There is no change in the 
rvalues. The^ntercept a differs from the original a 0 , but the slope b remains 
the same. j 

One of the advantages of measuring X t as deviations from their central 
value is that we can more explicitly ask the question "How is Y affected when 
X \s unusually large, or unusually small ?" In addition, the mathematics will 
be simplified because the sum of the new x values equals zero, 2 

2^=o 



2 Proof: 



(11-6) 



2>, -2 (*< 

Noting that the n}ean X is defined as = — - , it follows that ^ %i = **X and 



-X) 

nX 



2 x t « n X - nX « 0 



(11-6) proved 
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100 200 300 400 500 600 700 
X 
(b) 



x = X-X 
(new origin) 

X (old origin) 



FIG. 11-7 Translation of axis, (a) Regression, using original variables. (6) Regression, 

translating X. 



Step 2. Fit the line in Figure 1 1-7/?; i.e., fit the line 

Y = a + bx 



(11-7) 



to this scatter by selecting the values for a and b that satisfy the least squares 
criterion, i.e., select those values of a and b that minimize 

Since the fitted value f i is on our estimated line (11-7) 

t t = a + bx, (11-9) 

When this is substituted into (11-8), the problem becomes one of selecting 
a and b to minimize the sum of squares, 

S(fl,A) = 2(yi"*- W (11-10) 
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The notation S(a, b) is used to emphasize that this expression depends on a 
and b. As a aAd b vary (i.e., as various lines are tried), S(a, b) will vary too, 
and we ask at what value of a and b it will be a minimum. This will ^ive us 
our optimum ] (least squares) line. 

The simplest minimization technique is calculus, and it will be used in 
the next paragraph. [Readers without calculus can minimize (11-10) with the 
more cumberLme algebra of Appendix 11-1 , and rejoin us where the resulting 
theorem is stated below.] 

Minimiz ng S(a, b) requires setting its partial derivatives with respect to 
a and b equa| to zero. In the first instance, setting the partial derivative with 
respect to a 4qual to zero: 



A 2 {Yi - a - bx t )* = I 2(-\)(Y i - a - bx$ = 0 
da 

Dividing through by —2 and rearranging: 

Noting that = 0 by (11-6), we can solve for a. 

2Y< 



a = 



or 



a = Y 



(11-11) 



(11-12) 



(11-13) 



Thus our least squares estimate of a is simply the average value of Y; referring 
to Figure 1 1-7, we see that this ensures that our fitted regression line must 
pass through the point (X, F), which may be interpreted as the center of 
gravity of thje sample of n points. 

It is also necessary to set the partial derivative of (11-10) with respect 

to b equal to zero, 



£ 2 (Y, ,- a - bxf = 2 2(-x i )(Y i - a - bx$ = 0 
£ x ( (Y t -a- bx,) = 0 



(11-14) 
(11-15) 



Rearranging, 

ZYM-aZxi-b^x* =0 

Noting that 1 £ x i = °» we can solve for b - 

2 Y<*t 



b = 



2«f 



(11-16) 



to 

00 



Table 11-2 Least Squares Calculations 



(1) 

X, 



(2) 



(3) 

x, = X t - X 
= X t - 400 



(4) 



(5) 



(6) 

= 60 + .068^- 



(7) 



(8) 



D 

C 
o 

H 

o 

S 

M 
O 

rn 



O 

z 



100 
200 
300 
400 
500 
600 
700 

x = 



2,800 

- 2>8QQ 
~ 7 

= 400 



40 
45 
50 
65 
70 
70 
80 

S Y t = 420 
7 

= 60 



-300 
-200 
~ 100 
0 

100 
200 
300 

S * t - = o \ 



— 1 2,000 
— 9,000 
-5,000 
0 

7,000 
14,000 
24,000 

v y t x t = 19,000 



90,000 
40,000 
10,000 
0 

10,000 
40,000 
90,000 

Z,x 2 = 280,000 

19,000 
~ 280,000 



39.60 
46.40 
53.20 
60.00 
66.80 
73.60 
80.40 



.40 .16 

-1.40 1.96 

-3.20 10.24 

5.00 25.00 

3.20 10.24 

-3.60 12.96 

-.40 .16 

£ ( Y; - K) 2 = 60.72 
1 



n-2 



= 12.144 



and s = 3.48 



a = 60 



6 = .068 



Our 
Theorem 
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results 3 in (11-13) and (11-16) are important enough to restate as: 



With x values measured as deviations from 
their mean, the least squares values of a 
and b are 

a =7 (11-13) 



b = 



2 gto 
2>< 



(11-16) 



For the example problem in Table 11-1, a and b are calculated in the 
first five colunns in Table 11-2; (the last three columns may be ignored 
until the next chapter). It follows that the least-squares equation is: 

Y « 60 + .068* (11-17) 

This fitted link is graphed in Figure 11-76. 

Step 3. If desired, this regression can now be retranslated back into 
our original fyame of reference in Figure lUla. Express (11-17) in terms of 
the original X values: 

Y = 60 + -068(X - X) 
= 60 + ,068(X - 400) 
- 60 + .068X - 27.2 
7= 32.8 + .068X (11-18) 

This fitted linfe is graphed in Figure ll-7a. 

A comparison of (11-17) and (11-18) confirms that the slope of our 
fitted regression (b = .068) remains the same; the only difference is in the 
intercept. Moreover, we note how easily the original intercept (a 0 = 32.8) 
may be recovered. 

An estimjate of yield for any given fertilizer application is now easily 
derived from -our least squares equation (11-18). For example, if 350 lb of 
fertilizer is to be applied, our best estimate of yield is 

Y = 32.8 + .068 (350) = 56.6 bushels/acre 

The alternative least squares equation (11-17) yields exactly the same result. 
When X = 350, then x = -50, and 

Y= 60 + .068 (-50) - 56.6 



3 To be perfectly 
equal to zero, \ 
saddle point or 



We 



rigorous, we could have shown that when the partial derivatives are set 
actually do have a minimum sum of squares — rather than a maximum, 
ocal minimum. 
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PROBLEMS 

(Save your work in the next three chapters, for future reference.) 

11-1 Suppose a random sample of 5 families had the following income and 
savings: 

Family Income Y Savings S 



A 


$8,000 


S600 


B 


11,000 


1200 


C 


9,000 


1000 


D 


6,000 


700 


E 


6,000 


300 



(a) Estimate and graph the regression line of savings S on income 7. 

(b) Interpret the intercepts a and <2 0 . 

11-2 Use the data of Problem 11-1 to regress consumption C on income Y. 

(Economists define C — 7—5.) 
11-3 To interpret the regression slope b, use equation (11-18) to answer 

the following questions. 

(a) About how much is the yield increased for every pound of fertilizer 
applied? 

(b) If wheat were worth $2 per bushel and fertilizer cost $.25 per 
pound, would it be economical to apply fertilizer? 

(c) To what price approximately would fertilizer have to drop to 
make it economical? 

[The answer to (a) is simply the slope b. Economists refer to b as the 
"marginal" effect of fertilizer x on yield Y.] 
11-4 If we translated both X and Y into deviations x and y (just as X 
was translated in Figure 11-7&), then 

(a) What would the new ^/-intercept be? Would the slope remain the 
same? Does not this imply that the fitted regression equation is simply 

y = bx 

(b) Prove that J x iVi — 2 hence we may alternatively write b 
in terms of deviations as 



*ll-5 (Requires calculus.) Suppose Zis left in its original form, rather than 
being translated into x (deviations from the mean). 
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(a) Write out the sum of squared deviations as in (11-10), in terms of 
tf 0 and 

(b) Set ( equal to zero the partial derivatives with respect to a 0 and b, 
thus obtaining two so-called "normal" equations. 

(c) Evaluate these two normal equations using the data in Problem 
11-1 anjd solve for a 0 and b. Do you get the same answer? 

(d) Compare the two alternative methods of solution. 

* 1 1-6 Suppose four firms had the following profits and research expenditures. 



Firm 



Profit, P 
(thousands of dollars) 



Research Expenditure, R 
(thousands of dollars) 



1 
2 
3 
4 



50 
60 
40 
50 



40 
40 
30 
50 



(a) Fit 
(b) 
Criticize 



Do§s 



regression line of P on R. 

this regression line "show how research generates profits?" 



APPENDIX U-l AN ALTERNATIVE DERIVATION OF LEAST 
SQUARES ESTIMATES OF a AND b, 
WITHOUT CALCULUS 

Before estimating a and b 9 it is necessary to solve the theoretical problem 
of minimizing an ordinary quadratic function of one variable Z>, of the form 



where k 2 , k l9 k$ 
With a litt 



f(b) - k 2 b* + k x b + k Q 
are constants, with k 2 > 0. 



(11-19) 



e algebraic manipulation, (11-19) may be written as 



/»).*.(» + ij +(*.- |) (U-20) 

Note that b appears in the first term, but not in the second. Therefore our 
hope of minim zing the expression lies in selecting a value of b to minimize 
the first term. Feing a square and hence never negative, the first term will be 
minimized when it is zero, that is, when 



2k» 



(11-21) 
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b = 



2k 9 



(11-22) 



This result is shown graphically in Figure 11-8. To restate, a quadratic 
function of the form (1 1-19) is minimized by setting 



b — — 



(coefficient of first power) 
2 (coefficient of second power) 



(H-23) 



With this theorem in hand, let us return to the problem of selecting 
values for a and b to minimize 

S{a,b) = ^[{Y i -a)-bx i Y (11-24) 
It will be useful to manipulate this, as follows: 

S(a, b) = 2 [( Y t - ay - 2b( Y { - a)x i + b*x*] (1 1-25) 

= I(Y { - aY - 2b I (Y, - d)x i + b* 2 *f (H-26) 
In the middle term, consider 

2(y t -flK = 2 Ya-aXxt 
= 1 Y i x i + 0 

Using this to rewrite the middle term of (11-26) we have 

S(a 9 b) = 2 ( Yi - *) 2 -2b ^ Yfr + b* 2 *! (1 1-27) 

This is a useful recasting of (11-24), because the first term contains a alone, 
while the last 2 terms contain b alone. To find the value of a which minimizes 
(11-27) only the first term is relevant. This may be written 

2 (Y { - aY = 2 Yf - 2a J r, + ^ 2 



f(b) = k 2 b 2 + + fco 




2> = 



FIG. 11-8 The minimization of a quadratic function. 



According to 



To find the value of b which minimizes (11-27), only the last two terms 
are relevant. According to (11-23), this is minimized when 



appendix 11-1 

(11-23), this is minimized when 

-(-22 Yd _ I Y, 
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a = 



In 



= Y 



(11-13) proved 



b = 



22>? 2*? 



(1 1-16) proved 



chapter 1 2 

Regression Theory 



12-1 THE MATHEMATICAL MODEL 

So far we have only mechanically fitted a line. This yielded a and Z>, 
which are descriptive statistics of the sample, (like X in Chapter 2); now we 
wish to make inferences about the parent population (like our inferences 
about pi in Chapter 7). Specifically we must consider the mathematical model 
which allows us to run tests of significance on a and b. 

Turning back to the example in Section 11-1, suppose that the experi- 
ment could be repeated many times at a fixed value of x. Even though 
fertilizer application is fixed from experiment to experiment, we would not 
observe exactly the same yield each time. Instead, there would be some 
statistical fluctuation of the Y\ clustered about a central value. We can 
think of the many possible values of 7 forming a population ; the probability 
function of Y for a given x we shall call 1 p(Y\x). Moreover, there will be a 
similar probability function for Y at any other experimental level of x. One 
possible sequence of Y populations is shown in Figure 12-1#. There would 
obviously be mathematical problems involved in analyzing such a population. 
To keep the problem manageable, we make a reasonable set of assumptions 
about the regularity of these populations, as shown in Figure 12-1/?. We 
assume the probability functions piYJx^ have 

1. The same variance a 2 for all x H ; and 

2. Means Ei Y^ lying on a straight line, known as the true regression 

line: 

£(F,)-a + ^ (12-1) 

1 Remember that our notation conventions are different from Chapters 4 to 7. Now a 
capital letter denotes an original observation and a small letter denotes its deviation from 
the mean. 
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FIG. 12-1 (a) General populations of Y, given x. (b) The special form of the populations 
of Y assumed in simple linear regression. 



The population parameters a and /? specify the line; they are to be estimated 
from sample information. We also assume that 

3. The random variables Y t are statistically independent. For example, 
a large valu<? of Y x does not tend to make 7 2 large; i.e., T 2 is "unaffected" 
by Y v 

These assumptions may be written more concisely as: 



ith 



The random variables Y t are statistically independent, 

mean = a + px i 
and variance = a 1 



(12-2) 
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On occasion, ft is useful to describe the deviation of Y t from its expected 
value as the error or disturbance term so that the model may alternatively 
be written 

Yi = a + |&r, + e ; 
where the e t are independent random variables, with 
mean = 0 
and variance = a 2 

We note that the distributions of Y and e are identical, except that 
their means differ. In fact, the distribution of e is just the distribution of Y 
translated onto a zero mean. 

No assumption is made yet about the shape of the distribution of e 
(normal, or otherwise). We therefore refer to assumptions (12-4) as the 
"weak set"; we shall derive as many results as possible from these, before 
adding a more restrictive normality assumption later. 

12-2 THE NATURE OF THE ERROR TERM 

Now let us consider in more detail the "purely random" part of Y i9 the 
error or disturbance term e v Why does it exist? Or, why doesn't a precise 
and exact value of Y t follow, once the value of x i is given? 

The error may be regarded as the sum of two components: 

(a) Measurement Error 

There are various reasons why Y may be measured incorrectly. In 
measuring wheat yield, there may be an error due to sloppy harvesting or 
inaccurate weighing. If the example is a study of the consumption of families 
at various income levels, the measurement error in consumption might consist 
of budget and reporting inaccuracies. 

(b) Stochastic Error 

This occurs because of the inherent irreproducibility of biological and 
social phenomena. Even if there were no measurement error, continuous 
repetition of our wheat experiment using exactly the same amount of fertilizer 
would result in different yields; these differences are unpredictable and are 
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called stochastic differences. They may be reduced by tighter experimental 
control — for example, by holding constant soil conditions, amount of water, 
etc. But complete control is impossible — seeds, for example, cannot be 
duplicated. Stochastic error may be regarded as the influence on 7 of many 
omitted variables, each with an individually small effect. 

In the social sciences, controlled experiments are usually not possible. 
For example^ an economist cannot hold U.S. national income constant for 
several years while he examines the effect of interest rate on investment. 
Since he canftot neutralize extraneous influences by holding them constant, 
his best alternative is to take them explicitly into account, by regressing Y 
on x and the extraneous factors. This is a useful technique for reducing 
stochastic error; it is called "multiple regression" and is discussed fully in 
the next chapter. 



12-3 ESTIMATING a AND 0 

Suppose that our true regression Y = a + fix is the dotted line shown 
in Figure 12-2. This will remain unknown to the statistician, whose job it is 
to estimate it |as best he can by observing x and Y. Suppose at the first level 
z u the stochastic error e x takes on a negative value, as shown in the diagram; 
he will observe the Y and x combination at P v Similarly, suppose his only 
other two observations are P 2 and P s , resulting from positive values of e. 



p(Yfx) 
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Further, suppose the statistician estimates the true line by fitting a least 
squares line Y = a + bx, applying the method of Chapter 11 to the only 
information he has — points P u P 2 , and P 3 . He would then come up with the 
solid estimating line in this figure. This is a critical diagram; before proceed- 
ing, the reader should be sure he can clearly distinguish between the true 
regression and its surrounding e distribution on the one hand, and the 
estimated regression line on the other. 

Unless the statistician is very lucky indeed, it is obvious that his esti- 
mated line will not be exactly on the true population line. The best he can 
hope for is that the least squares method of estimation will be close to the 
target. Specifically, we now ask: "How is the estimator a distributed around 
its target a, and b around its target /??" 



12-4 THE MEAN AND VARIANCE OF a AND b 



We shall show that the random estimators a and b have the following 
moments: 



E(a) 


— a 


var (a) 


a 2 
n 


E(b) 




var (b) 


a 2 



(12-5) 

(12-6) 
(12-7) 

(12-8) 



where a 2 is the variance of the error (the variance of Y). We note from (12-5) 
and (12-7) that both a and b are unbiased estimators of a and /?. Because of 
its greater importance we shall concentrate on the slope estimator b y rather 
than a, for the rest of the chapter. 

Proof of (12-7) and (12-8). The formula for b in (11-16) may be re- 
written as 



where 

Thus 

where 



2(f)". 



w ? . = - 
k 



w Y 



(12-9) 

(12-10) 
(12-11) 

(12-12) 
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Since each x i \% a fixed constant, so is each w { . Thus from (12-1 1) we establish 
the important conclusion, 



Hence by (5- 
E(b) 



b is a weighted sum (i.e., a linear combination) 
of the random variables Y ( 



(12-13) 



1) we may write 

= w 2 £( r,) + w s E( ¥,)■■• + w n E( Y n ) = 2 r 8 ) (12-14) 

Moreover, ndting that the variables Y { are assumed independent, by (5-34) 
we may write 

vaJ (b) = w" var Y 1 + ■ ■ ■ + w\ var r n = J var F f (12-15) 

For the niean|, from (12-14) and (12-1) 

E(A) - 5>i[« + 

= « 2 w ( + /? 2 w i x i 



and noting ( 



(12-16) 
(12-17) 



2-12) 



E(*) = 7 5>, + f2>,K 

fc /c / 



(12-18) 



but ^ *i is z^ro, according to (1 1-6). Thus 



From (12-10) 



E(b) = 0 + f 



£(b) = /? 

For the variance, from (12-15) and (12-2) 

var (b) — 2 Wt^ 2 

^fc 2 



Again noting (12-10), 



A similar derivation of the mean and variance of a is left as an exercise, 



completing 



var (b) = 



2>? 



(12-7) proved 

(12-19) 
(12-20) 

(12-21) 
(12-8) proved 



:he proof. We observe from (12-12) that in calculating b, the 



weight m,\ attached to the 7, observation is proportional to the deviation x t . 
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Hence the outlying observations will exert a relatively heavy influence in 
the calculation of b. 



*12-5 THE GAUSS-MARKOV THEOREM 

This is the major justification of using the least squares method in the 
linear regression model. 



Gauss- Markov Theorem. Within the class of linear 
unbiased estimators of /S (or a), the least squares 
estimator has minimum variance. 



(12-22) 



This theorem is important because it follows even from the weak set of 
assumptions (12-4), and hence requires no assumption of the shape of the 
distribution of the error term. A proof may be found in most mathematical 
statistics texts. 

To interpret this important theorem, consider b, the least squares 
estimator of /?. We have already seen in (12-13) that it is a linear estimator, 
and we restrict ourselves to linear estimators because they are easy to analyze 
and understand. We restrict ourselves even further, as shown in Figure 12-3; 
within this set of linear estimators we consider only the limited class that are 
unbiased. The least squares estimator not only is in this class, according to 
(12-7), but of all the estimators in this class it has the minimum variance. 
It is often, therefore, referred to as the "best linear unbiased estimator/' 

The Gauss-Markov theorem has an interesting corollary. As a special 
case of regression, we might ask what happens if we are explaining F, but 




FIG. 12-3 Diagram of the restricted class of estimators considered in the Gauss-Markov 

theorem . 
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P = 0 in (12-2), so that no independent variable x comes into play. From 
(12-2), a is the mean of the Y population (//). Moreover, from (11-13) its 
least squares estimator is F. Thus, the least squares estimator of a population 
mean (fi) is tie sample mean (Y), and the Gauss-Markov theorem fully 
applies: the sa ■ Triple mean is the best linear unbiased estimator of a population 
mean. 

It must fcje emphasized that the Gauss-Markov theorem is restricted, 
applying only to estimators that are both linear and unbiased. It follows that 
there may be a biased or nonlinear estimator that is better (i.e., has smaller 
variance) than the least squares estimator. For example, to estimate a 
population mean, the sample median is a nonlinear estimator. It is better than 
the sample msan for certain kinds of nonnormal populations. The sample 
median is just one example of a whole collection of nonlinear statistical 
methods kno^n as "distribution-free" or << nonparametric ,, statistics. These 
are expressly Resigned for inference when the population cannot be assumed 
to be normally distributed. 



12-6 THE DISTRIBUTION OF b 



With the: mean and variance of b established in (2-7) and (2-8), we now 
ask: "What is the shape of the distribution of bV If we add (for the first 
time) the strqmg assumption that the Y i are normal, and recall that 6 is a 
linear combination of the Y h it follows that b will also be normal from 
(6-13). But ev*en without assuming the Y i are normal, as sample size increases 
the distribution of b will usually approach normality ; this can be justified by 
a generalized form 2 of the central limit theorem (6-15). 

We are now in a position to graph the distribution of b in Figure 12-4, 
in order to develop a clear intuitive idea of how this estimator varies from 
sample to sa-mple. First, of course, we note that (12-7) established that b 
is an unbiased estimator, so that the distribution of b is centered on its 
target 0. j 

The interpretation of the variance (12-8) is more difficult. Suppose that 
the experiment had been badly designed with the A"/s close together. This 
makes the deviations x i small; hence ]£a?? small. Therefore <r 2 /2 the 
variance of ijfrom (12-8) is large and b is a comparatively unreliable estimator. 
To check th^ intuitive validity of this, consider the scatter diagram in Figure 
12-5a. The bunching of the X's means that the small part of the line being 



2 The central 



imit theorem (6-15) concerned the normality of the sample mean X. In 



Problem 6-8 it was seen to apply equally well to the sample sum S. It applies also to a 
weighted sum of random variables such as b in (12-13), under most conditions. See for 
example, D. A S. Fraser, Nonparametric Statistics, New York: John Wiley, 1957. Similarly 
the normality of a is justified. 



j 
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P(b) 




E(b) = 0 b 
FIG. 12-4 The probability distribution of the estimator b. 




Unknown true 
regression 



Y = a + 6* 
Estimated 
regression 



(a) 



\ 



Unknown true 
regression 

Y=a + ox 
Estimated 
regression 



(b) 



FrG. 12-5 (a) Unreliable estimate when X £ are very close, (b) More reliable fit because 

X i are spread out. 
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investigated is obscured by the error e, making the slope estimate b very 
unreliable. In :his specific instance, our estimate has been pulled badly out 
of line by the errors — in particular, the one indicated by the arrow. 

By contrast, in Figure 12-5/? we show the case where the Z's are reason- 
ably spread 01 1. Even though the error e remains the same, the estimate b 
is much more eliable, because errors no longer exert the same leverage. 

As a concrete example, suppose we wish to examine how sensitive 
Canadian imports (Y) are to the international value of the Canadian dollar 
(x). A much mjore reliable estimate should be possible using the period 1948 
to 1962 whenjthe Canadian dollar was flexible (and took on a range of 
values) than irj the period before or since when this dollar was fixed (and 
only allowed to fluctuate within a very narrow range). 



12-7 CONFIDENCE INTERVALS AND TESTING HYPOTHESfeS 
ABOUT 

With the nean, variance and normality of the estimator b established, 
statistical inferences about are now in order. Our argument will be similar 
to the inference about fx in Section 8-2. First standardize the estimator b, 
obtaining 



Z = 



b - 



(12-23) 



where Z ~ vV(0, 1). 

Since cr 2 , the variance of Y is generally unknown, it is estimated with 



where f t is the 



The divisor (n 



estimator b is n 



5 2 = 



1 



(12-24) 



fitted value of 7 on the estimated regression line: i.e. 

f. = a + far, (12-25) 



s 2 is often referred to as "residual variance," a term similarly used in ANOVA. 



2) is used in (12-24) rather than n in order to make s 2 an 



unbiased estimator 3 of a 2 . When this substitution of s 2 for a 2 is made, the 



longer normal, but instead has the slightly more spread-out 



3 As argued in the footnote to equation (8-1 1). But in the present calculation of s 2 , two 



estimators a and h 



w „, are required; thus there remain two fewer degrees of freedom for s 2 . 

Hence (n — 2) is the divisor in s 2 , and also the degrees of freedom of the subsequent / 
distribution in (I 



4. 
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/ distribution: 



b-p 



(12-26) 



2>? 



For the t distribution to be strictly valid, we require the strong assumption 
that the distribution of Y i is normal. From (12-26) we may now proceed to 
construct a confidence interval or test an hypothesis. 



(a) Confidence Intervals 

Again letting f 025 denote the / value which leaves 2\% of the distribution 
in the upper tail, 



Substituting for t from (12-26) 
Pr 



'.025 ^ / — \ ' .025 



.95 



The inequalities within the bracket may be reexpressed 



Pr b - t, 



< p < b + f. 025 



which yields 



The 95% confidence interval 4 for 0: 

s 

P = b ± * M5 ■ 



(12-27) 



(12-28) 



= .95 (12-29) 



(12-30) 



where f <025 has {n — 2) degrees of freedom. 

For our example of wheat yield in the previous chapter, the confidence 
interval for ft (the effect of fertilizer on yield) is computed as follows, s is 
evaluated in the last three columns of Table 11-2. Also noting the values 



4 Using a similar argument, and noting (12-6), the 95% confidence interval for a is: 

s 



a = a ± ?.o25 — 



(12-31) 



We note that this is very similar to the confidence interval for given by equation (8-15). 



for b and 2 x 
becomes 
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calculated in that table, our 95% confidence interval (12-30) 



0 = .068 ± 2.571 



3.48 



V28o,oo(r 

= .068 ± .017 
.051 < /? < .085 



(12-32) 



Testing hypotheses. A two-sided test of any hypothesis may be carried 
out simply by noting whether or not the confidence interval (12-30) contains 
that hypothesis. For example, the hypothesis typically tested is the null 
hypothesis. 

= 0 (12-33) 
example, that fertilizer has no effect on yield. In testing this 



i.e., using our 

against the tw6-sided alternative 



(12-34) 



H Q must be rejected at a 5% significance level, since the null value of zero 
is not containejd in (12-32). 

Since fertilizer is expected to favorably affect yield, it seems more 
appropriate to test (12-33) against the one-sided alternative: 



H x \p > 0 



(12-35) 



The first step to calculate the / statistic under the assumption that H 0 is 
true. From (12-26) and (12-33) this reduces to 



and in our example: 



.068 



14 



3.48/7280,000 



= 10.3 



(12-36) 



(12-37) 



Since this observed value exceeds the critical f, 05 value of 2.015, H 0 is rejected 
in favor of the conclusion that fertilizer favorably affects yield. 



12-8 PREDICTION INTERVAL FOR Y 0 



If we plan one new application of 550 pounds of fertilizer (x 0 = 150) 
how do we predict the resulting yield? 

The best p6int estimate will be the corresponding fitted Y value on our 
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Y=a + 0x 
True regression 

' Y=a + 6x 
Estimated 
regression 



FIG. 



12-6 How the estimator Y 0 is related to the target E(Y 0 \ 



estimated regression line, i.e.: 

f 0 = a + bx 0 (12-38) 

= 60 + .068(150) = 70.2 bu/acre (12-39) 

But as a point estimate, this will almost certainly involve some error because, 
for example, of errors made in calculating a and b. Figure 12-6 illustrates 
the effect of these errors; the true regression is shown, along with an estimated 
regression. Note how the fitted Y 0 in this case underestimates. In Figure 12-7 
the true regression is again shown along with several estimated regressions 
fitted from several possible sets of sample data. The fitted value is sometimes 
too low, sometimes too high, but on the average, just right. 

The important observation in Figure 12-7 is that if x 0 were further to 




J L 

x x 0 
0 



FIG. 12-7 Y 0 as an unbiased estimator of E(Y Q ). 
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the right, then 



our estimates would be spread out over an even wider range. 
On the other hand if x 0 were further to the left and closer to its central value 
of zero then our estimates would be less spread out Moreover, it is the error 
in b that causes this; thus such an error in the slope b will do little harm in 
>iven an average amount of fertilizer; but any prediction of 
the effect of an extreme amount of fertilizer will be thrown badly into error. 
Formally, it may be shown that 5 the 95% prediction interval for an in- 
dividual Y observation is 



Yo=Yo± t 



025 - 



+ 1 



(12-42) 



where ?. 025 ha^ (n — 2) d.f. 

For exarrple, we can now construct a prediction interval for yield if 
550 lb/acre of fertilizer were applied. With a 95% chance of being right we 
predict: 



tion for all po 



/I 150 2 
Y 0 = 70.2 ± 2.571(3.48) /- + 
° ■ '\ 7 280,000 

60.3 < Y 0 < 80,1 



+ 1 



■ (12-43) 

This prediction interval is shown in Figure 12-8. Moreover, the same calcula- 



ssible x 0 values yields the dotted band of prediction intervals, 



expanding as £ 0 moves farther away from its central value of zero 

It should jbe emphasized that x 0 may be any value of x. If x 0 lies among 
the observed yalues x x • • • x ni the process is called "interpolation.'' If x 0 is 
one of the values x 1 • * • x n , the process might be called, "using also the other 
values of x to sharpen our knowledge of this one population at # 0 ." if x 0 is 



beyond x x or 



then the process is called "extrapolation." The techniques 



5 Without going 
in a prediction is 



nto the proof of (12-42), we sketch its plausibility. The variance involved 
roughly 



var = var (a) + var (b)x\ + var ( Y) 



(12-40) 



that is, the variarice of a, plus the variance of b weighted with plus the inherent variance 
of any Y observation. This last source of error must be included of course; even if a and /? 
were known exactly, the prediction of Y 0 would still be subject to error. 
Into (12-40)kve substitute (12-6), (12-8), and (12-2) 



var = h 



(4) + a 2 



(12-41) 



When s is substituted for a, (12-42) follows. 
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FIG. 12-8 Prediction interval for Y 0 . 

developed in this section may be used for extrapolation, but only with con- 
siderable caution— as we shall see in the next section. 



PROBLEMS 

12-1 Construct a 95% confidence interval for the regression coefficient /? in 

(a) Problem 11-1. 

(b) Problem 11-2. 

12-2 Which of the following hypotheses does the data of Problem 11-1 
prove to be unacceptable at the 5% level of significance? 

(a) p - 0 

(b) p - 1/2 

(c) £ = .1 

(d) P = -.1 

12-3 At the 1 % level of significance, use the data of Problem 11-1 to test 
the hypothesis that savings does not depend on income, against the 
alternative hypothesis that savings increases with income. 

12-4 Using the data of Problem 11-1, what is your 95% prediction interval 
for the savings of a family with an income of 
(a) $6,000 
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(b) 
(c) 
(d) 
(e) 
(0 



$8,000 



310,, 



000 



$12,000 

Which of these four intervals is least precise? Most precise? 
How is the answer to (b) related to the confidence interval found 
from (12-31)? 

12-5 Supposej you are trying to explain how the interest rate (/) affects 
investment (I) in the U.S. Would you prefer to take observations of i 
and I ov£r a period in which the authorities were trying to hold interest 
constant, or a period in which it is allowed to vary widely? 



12-9 DANGERS OF EXTRAPOLATION 



There are' two dangers in extrapolation, which we might call "mathe- 
maticar and practical.' ' In both cases, there is no sharp division between 
safe interpolation and dangerous extrapolation. Rather, there is continually 
increasing danger of misinterpretation as x 0 gets further and further from its 
central value. 



(a) Mathematical Danger 

It was emphasized in the previous section that prediction intervals get 
larger as x 0 m6ves away from zero. This is true, even if all the assumptions 
underlying oui mathematical model hold exactly. 



(b) Practical Danger 



In practice it must be recognized that a mathematical model is never 
absolutely correct. Rather, it is a useful approximation. In particular, one 
cannot take sejriously the hypothesis that the population means are strung 
out in an exactly straight line. If we consider the fertilizer example, it is likely 
that the true relation increases initially, but then bends down eventually as 
a "burning poifit" is approached, and the crop is overdosed. This is illustrated 
in Figure 12-9 which is an extension of Figure 11-2 with the scale appropri- 
ately reduced. In the region of interest, from 0 to 700 pounds, the relation 
is practically a straight line, and no great harm is done in assuming the linear 
model. However, if the linear model is extrapolated far beyond this region 
of experimentation, the result becomes meaningless. 
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Linear model 




\ 

\ 

i i i i i u ii i i i u i h i 

100 400 700 1,000 
Fertilizer X 



FIG. 12-9 Comparison of linear and nonlinear models. 

There are "nonlinear" models available, if (hey seem more appropriate. 
Moreover statistical tests are available to help determine whether or not 
they are appropriate. These topics are covered in more advanced texts. 

*12-10 MAXIMUM LIKELIHOOD ESTIMATION 

Sections 12-1 to 12-5 including the Gauss-Markov justification of least 
squares required no assumption of the normality of the error term (i.e., 
normality of Y). In Sections 12-6 to 12-9, the normality assumption was 
required only for small sample estimation — and this because of a quite 
general principle that small sample estimation requires a normally distributed 
parent population to validate the t distribution. In these last two sections we 
make the strong assumption of a normally distributed error throughout. On 
this premise, we derive the maximum likelihood estimates of a and /?, i.e., 
those hypothetical population values of a and /? more likely than any others 
to generate the sample values we observed. These MLE of a and (3 turn out 
to be the least squares estimates; thus maximum likelihood provides a second 
justification for using least squares. 

Before addressing the algebraic derivation, it is best to clarify what is 
going on with a bit of geometry. Specifically, why should the maximum 
likelihood line fit the data well? To simplify, assume a sample of only three 
observations (P l5 P 2 , P 3 ). 

First, let us try out the line shown in Figure 12-10tf. (Before examining 
it carefully, we note that it seems to be a pretty bad fit for our three observed 
points.) Temporarily, suppose this were the true regression line; then the 
distribution of errors would be centered around it as shown. The likelihood 
that such a population would give rise to the samples we observed is the 
probability density that we would get the particular set of three e values 
shown in this diagram. The probability density of the three values is shown 
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as the ordinjates above the points P l9 P 2 , and P 3 . Because our three observa- 
tions are by assumption statistically independent, the likelihood of all three 
(i.e., the probability density of getting the sample we observed), is the product 
of these three ordinates. This likelihood seems relatively small, mostly 
because the very small ordinate of P 1 reduces the product value. Our intuition 
that this is a bad estimate is confirmed; such a hypothetical population is 
not very likely to generate our sample values. We should be able to do better. 

In Figure 12-10/? it is evident that we can do much better. This hypo- 
thetical population is more likely to give rise to the sample we observed. 




The second observation 
Its probability ordinate 



*3 

(a) 
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The disturbance terms are collectively smaller, with their probability density 
being greater as a consequence. 

The MLE technique is seen to involve speculating on various possible 
populations. How likely is each to give rise to the sample we observed? 
Geometrically, our problem would be to try them all out, by moving the 
population through all its possible values — i.e., by moving the regression 
line and its surrounding e distribution through all possible positions in space. 
Each position involves a different set of trial values for a and /3. In each case 
the likelihood of observing P l9 P 2 , P s would be evaluated. For our MLE we 
choose that hypothetical population which maximizes this likelihood. It is 
evident that little further adjustment is required in Figure 12-106 to arrive 
at the MLE. This procedure seems to result, intuitively, in a good fit; more- 
over, since it seems similar to the least squares fit, it is no surprise that we 
shall be able to show that the two coincide. 

There are two other points worth noting. The MLE is derived from our 
three sample observations; another set of sample observations would almost 
certainly give rise to another MLE of a and f}. The second point is more 
subtle. The likelihood of any population yielding our sample depends on 
not only the size of the e terms involved, but also on the shape of the e 
distribution — in particular a 2 , the variance of e. However, it can be shown 
that the maximum likelihood line does not depend on a 2 . In other words, if 
we assume a 2 is larger, the geometry will look different, because e will have 
a flatter distribution ; but the end result will be the same maximum likelihood 
line. 

While geometry has clarified the method, it hasn't provided a precise 
means of arriving at the specific maximum likelihood estimate. This must be 
done algebraically. For generality, suppose that we have a sample of size n, 
rather than just 3. We wish to know 

p(Yu *V" Yn) 02-44) 

the likelihood or probability density of the sample we observed — expressed 
as a function of the possible population values of a, /S, and a 2 . First, consider 
the probability density of the first value of T, which is 

V 2ttg" 

This is simply the normal distribution of Y u with its mean (a + fax) and 
variance (a 2 ) substituted into the appropriate positions, fin terms of the 
geometry of Figure 12-10, p{Y x ) is the ordinate above P Y J] The probability 
density of the second Y value is similar to (12-45), except that the subscript 
2 replaces 1 throughout, and so on, for all the other observed lvalues. 



The independence 
bilities togeth 
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of the Y values justifies multiplying all these proba- 
er to find (12-44). Thus 



p(Y^Y 2 Y n ) 
1 



Jlna 2 



e -(i/2ff-)[ri-<« j -j53Si)]" 



-It 



p-Wo'KYi-U-. fix*)? 



J2 



77 CT 



■no 



(12-46) 



where Yl represents the product of n factors. Using the familiar rule for 

exponentials,* the product in (12-46) can be reexpressed by summing 
exponents 



KY lt Y t ,. 



(12-47) 



Recall that the observed 7's are given. We are speculating on various 
values of a, j?, and a 2 . To emphasize this, we rename (12-47) the likelihood 
function 

1 



L(a, ft a 2 ) = 



(12-48) 



(27T<j 2 y /2 

We now ask., which values of a and /? make L largest? The only place a and 
ft appear is in the exponent; moreover, maximizing a function with a negative 
exponent involves minimizing the exponent. Hence our problem is to choose 
a and (i in order to 

minimize £ - a - fe] 2 (12-49) 

Moreover, this provides the maximum likelihood solution for a and ft 
regardless o^ the value of a. This is the proposition suggested in the geo- 
metrical analysis in Figure 12-10; no matter what is assumed about the 
spread of the distribution, the maximum likelihood line is not affected by it. 

But anjeven more important conclusion follows from comparing equa- 
tion (12-49) with equation (11-10). Maximum likelihood estimates are identical 
to least squares estimates. The selection of least squares estimates a and b 
to minimizi (11-10) is identical to the selection of maximum likelihood 
estimates or a and /? to minimize (12-49). The only difference is that we've 
called our estimates different names. This establishes our other important 
theoretical justification of the least squares method: it is the estimate that 
follows fron applying maximum likelihood techniques to a model with 
normally distributed error. 



h for any a and b. 
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* 12-11 THE CHARACTERISTICS OF THE INDEPENDENT 
VARIABLE 

So far it has been assumed that the independent variable x takes on a 
given set of fixed values (for example, fertilizer application was set at certain 
specified levels). But in many cases x cannot be controlled in this way. Thus 
if we are examining the effect of rainfall on yield, it must be recognized that 
x (rainfall) is a random variable, completely outside our control. The sur- 
prising thing is that the same MLE follows whether x is fixed or a random 
variable, if we assume [as well as (12-4)], that 

1. The distribution of x does not depend on a, /?, or a 2 . (12-50) 

2. The distribution of e is independent of x, being N(0, a 1 ) 

for every x it (12-51) 

The likelihood of our sample now involves the probability of observing 
both x and F. If the x i are independent, the likelihood function is 

L = [p^MYjx^lpiz^Y^] ■ - - (12-52) 
Because of the normality assumption (12-51), 

L - pixj -pL= ^- (1/ ^ Hi -^ )2 K l r 2 ) — L= e -<u2«*nr 2 ~«-w* . . . (12 . 53) 

\J27T0 2 \/277<7 2 

Collecting the exponents 

L = p{x x )p{x») U e-a^mv^-p^ (I2 _ 54) 

(2ttct 2 )" /2 

Since p(x) does not depend on the parameters a, /?, and <r 2 according to 
(12-50), the problem of maximizing this likelihood function with respect to 
these parameters reduces to the minimization of the same exponent as before. 
This holds true, in fact, even if the x i are not independent, and are determined 
by a joint probability distribution; then (12-54) becomes: 

L(a, /?, a 2 ) = p(x x , *„) — L- e-W^n-*-^* (12-55) 

(27T(T ) 7 

again requiring the same (least squares) minimization of the exponent. 

We conclude that MLE and least squares coincide regardless of whether 
the independent variable x is fixed, or a random variable — if x is independent 
of the error and parameters in the equation being estimated. This greatly 
generalizes the application of the regression model. 



chapter 13 



Multiple Regression 



13-1 INTRODUCTORY EXAMPLE 

Suppose that the fertilizer and wheat yield observations in Chapter 11 
were taken it several different agricultural experiment stations across the 
country. Even if soil conditions and temperatures were essentially the same 
in all these areas, we still might ask, "Can't part of the fluctuation in Y 
(i.e., the disturbance term e) be explained by varying levels of rainfall in 
different areas?" A better prediction of wheat yield may be possible if both 
fertilizer ana rainfall are examined. Notice how this argument is similar to 
the one usedj in two factor ANOVA: if the error e can be reduced by taking 
rainfall R irito account, we will get a better explanation of how the other 
variables arej related. The observed levels of rainfall are shown in Table 13-1, 
along with the original observations of wheat yield and fertilizer from 
Table 11-1. 



Table 13-1 Observed Wheat Yield, 


Fertilizer Application, and Rainfall 


Y 


X 


z 


Wheat Yield 


Fertilizer 


Rainfall 


(bu/acre) 


(lb/acre) 


(inches) 


40 


100 


36 


45 


200 


33 


50 


300 


37 


65 


400 


37 


70 


500 


34 


70 


600 


32 


80 


700 


36 
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13-2 THE MATHEMATICAL MODEL 

The multiple regression technique used to describe how a dependent 
variable is related to two or more independent variables is in fact only an 
extension of the simple regression analysis of the previous two chapters. 
Yield Y is now to be regressed on the two independent variables, or "re- 
g^sso^, 1 ' fertilizer Zand rainfall Z, Let us suppose it is reasonable to argue 
that the model is of the form 

E(Y t ) = « + px t + yz t (13-1) 

with both regressors x and z measured as deviations from their means. 
Geometrically this equation is a plane 1 in the three-dimensional space shown 
in Figure 13-L For any given combination of rainfall and fertilizer (z i9 z t ), 
the expected yield £(¥,) is the point on this plane directly above, shown 
as a hollow dot. Of course, the observed value of Y, shown as a solid dot, is 
very unlikely to fall precisely on this plane. For example, our particular 
observed Y } at this fertilizer/rainfall combination is somewhat greater than 
its expected value, and is shown as the solid dot lying directly above this 
plane. The difference between any observed and expected value of Y\ is the 



Y E(Y) = ct + 0 + yz 




FIG. 13-1 Scatter of observed points about the true regression plane. 

1 It is a plane because it is linear in r and z. Looked at from another point of view we 
could say that (13-1) is linear in a, /?, and y. In fact, this latter linearity assumption is the 
more important of the two, since we are involved in estimating a, /J, and y\ it is this as- 
sumption that keeps our estimating equations (13-4) linear. 
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stochastic or error term e v Thus any observed value Y t may be expressed as 
its expected Value plus this disturbance term 

Yi = ol + fa 4- yz t + e, (13-2) 

with our assumptions about e the same as in Chapter 12. 

ft is georietrically interpreted as the slope of the plane as we move in a 
direction parallel to the (x, Y) plane, i.e., keep z constant; thus 0 is the 
marginal effect of fertilizer x on yield Y. Similarly y is geometrically inter- 
preted as the Hope of the plane as we move in a direction parallel to the (z, Y) 
plane, i.e., keep x constant; hence y is the marginal effect of z on 7. 



13-3 LEAST SQUARES ESTIMATION 

Least squares estimates are derived by selecting the estimates of a, ft, 
and y that m 

Y*s and the fitted > r, s; i.e., minimize 2 



where a, b 
by setting 



nimize the sum of the squared deviations between the observed 



(13-3) 



, and 
the 



c are our estimators of a, /?, and y. This is done with calculus 
partial derivatives of this function with respect to a, b, and c 



2 Maximum likelihood estimates of a, /?, and y are derived in the same way as in the simple 
regression case again this coincides with least squares. Geometrically, this involves trying 
out all possible hypothetical regression planes in Figure 13-1, and selecting that one that 
is most likely to generate the solid-dot sample values we actually observed. 

But first, note that Figure 13-1 involves 3 parameters (a, /5, and y), and 3 variables 
(7, x, and z), however, there is one additional variable in our system—/? (Yjx, z) — which 
has not yet been plotted. It may appear that there is no way of forcing 4 variables into 
a three-dimensional space, but this is not so. For example, economists often plot 3 variables 
(labor, capital; and output) in a two-dimensional labor-capital space by introducing the 
third output variable as a system of isoquants. Those for whom this is a familiar exercise 
should have little trouble in graphing four variables [Y, x, z, and p{Y\*> z)] in a three- 
dimensional (r, and z) space by introducing the fourth variable [p{Yjx, z)] as a system 
of isoplanes. Each of these isoplanes represents (V, x t z) combinations that are equi- 
probable (i.e.,- for which the probability density of Y is constant). Thus the complete 
geometric moqei is the regression plane shown in Figure 13-1, with isoprobability planes 
stacked above 'and below it. Our assumptions about the error term (12-4) gaurantee that 
the isoprobabi ity planes will be parallel to the true regression plane. 

For MLE, we introduce the additional assumption that the error configuration is 
normal. Then we shift around a hypothetical regression plane along with its associated 
set of parallel isoprobability planes. Tn each position the probability density of the observed 
sample of poirtts is evaluated by examining the isoprobability plane on which each point 
lies, and multiplying these together. That hypothetical regression which maximizes this 
likelihood is chosen. The algebra resembles the simple case in Section 12-10; it is easy to 
show that this results in minimizing the sum of squares (13-3). 



Table 13-2 Least Squares Estimates for Multiple Regression of Y on X and Z 



Y } = 60 + 

Y i Xf Zf Xf = Xf - X z h = Z t - Z Y i x j Y i z i x\ z 2 . x f z- .06893* + .6028z 



40 


100 


36 


-300 


i 


45 


200 


33 


-200 


-2 


50 


300 


37 


-100 


2 


65 


400 


:' 37 


0 


2 


70 


500 


34 


100 


-1 


70 


600 


32 


200 


-3 


80 


700 


36 


300 


1 


2 Y t = 420 












- 2800 


= 245 






a = Y = 60 


X = 400 


Z = 35 







-12,000 


40 


90,000 


I 


-300 


39.9 


-9,000 


-90 


40,000 


4 


400 


45.0 


-5,000 


100 


10,000 


4 


-200 


54.3 


0 


130 


0 


4 


0 


61.2 


7,000 


-70 


10,000 


1 


-100 


66.3 


14,000 


-210 


40,000 


9 


-600 


72.0 


24,000 


80 


90,000 


1 


300 


81.3 














= 19,000 


= -20 


= 280,000 


= 24 


= -500 





Estimating equations (13-4) 
Solution 



19,000 = 280,0006 - 500r 
-20 = -5006 + 24c 
b = .06893 
c = .6028 

Thus our regression is Y = a + bx + cz 

= 60 + .06893* + .6028s 

Or, in terms of the original X and Z 

K = 60 + .06893(X - X) + -6028(Z - Z) 
= 60 + .06893(A- - 400) + .6028(Z -35) 



y - 11.3307 + .06893^ + .6028Z 
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equal to zero, (or algebraically by a technique similar to that used in Appen- 
dix 1 1-1). The result is the following three estimating equations: 

a = F 

Again, noti that the intercept estimate a is the mean of Y. The second and 
third equatjons may be solved for b and c. These calculations are shown in 
Table 13-2, and yield the fitted multiple regression equation. 



PROBLE1V S 

13-1 Suppose a random sample of 5 families yielded the following data (an 
extension of Problem 1 1-1) 



ramily 



Savings S 



Income Y 



Assets W 



A 
B 

C 
D 

E 



S 600 
1,200 
1,000 
700 
300 



S 8,000 
11,000 
9,000 
6,000 
6,000 



$12,000 
6,000 
6,000 
3,000 
3 8,000 



(a) Estimate the multiple regression equation of S on Y and W. 

(b) Does the coefficient of ^differ from the answer to Problem 11-1 (a)? 
Whicji coefficient better illustrates the relation of S to Yl 

(c) For a family with assets of $5000 and income of $8000, what would 
you predict savings to be? 

(d) Calculate the residual sum of squares, and residual variance s 2 . 

(e) Are you satisfied with the degrees of freedom you have for s 2 in 
this problem? Explain. 

13-2 Suppose a random sample of 5 families yielded the following data (an 
extension of Problem 11-1) 



Family 



Savings S 



Income Y 



Number of 
Children N 



A 
B 
C 
D 

E 



S 600 
1,200 
1,000 
700 
300 



$ 8,000 
11,000 
9,000 
6,000 
6,000 



5 
2 
1 
3 
4 
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(a) Estimate the multiple regression of S on Y and N. 

(b) For a family with 5 children and income of $6000, what would 
you predict savings to be? 

* 1 3-3 Combining the data of Problems 1 3- 1 and 1 3-2, we obtain the following 
table 



Number of 



Family 


Savings S 


Income Y 


Assets W 


Children N 


A 


S 600 


S 8,000 


512,000 


5 


B 


1,200 


11,000 


6,000 


2 


C 


1,000 


9,000 


6,000 


1 


D 


700 


6,000 


3,000 


3 


E 


300 


6,000 


18,000 


4 



Measuring the independent variables as deviations from the mean, we 
wish to estimate the regression equation 

S = a + (hj + yw + dn 

(a) Generalizing (13-4), use the least squares criterion to derive the 
system of 4 equations needed to estimate the four parameters. 

(b) Using a table such as Table 13-2, calculate the estimates of the four 
parameters. 



13-4 MULTICOLLINEARITY 

(a) In Simple Regression 

In Figure 12-5a it was shown how our estimate b became unreliable if 
the X/s were closely bunched, i.e., if the regressor X had little variation. It 
will be instructive to consider the limiting case, where the Z/s are concentrated 
on one single value X 0 , as in Figure 13-2. Then b is not determined at all. 
There are any number of differently sloped lines passing through (X, Y) which 
fit equally well — for each line in Figure 13-2, the sum of squared deviations 
is the same, since the deviations are measured vertically from (X, Y). This 
geometric fact has an algebraic counterpart. If all X i = X, then all x i = 0, 
and the term involving b in (11-10) is zero; hence the sum of squares does not 
depend on b at all. It follows that any b will do equally well in minimizing the 
sum of squares. An alternative way of looking at the same problem is that 
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FIG. 13 




~ x X 

■2 Degenerate regression, because of no spread (variation) in X. 



since all x t are zero, 2 x \ m the denominator of (1 1-16) is zero, and 6 is not 
defined. 

In conclusion, when the values of X show little or no variation, then the 
effect of Xori 7 can no longer be sensibly investigated. But if the problem is 
predicting F-J-rather than investigating F's dependence on X— this bunching 
of the X valulis doesn't matter provided we stick to this same value of X- All 
the lines in F gure 13-2 predict Y equally well. The best prediction is Y 9 and 
all these lines give us that result. 



(b) In Multiple Regression 

Again consider the limiting case where the values of the independent 
variables X a nd Z are completely bunched up on a line L, as in Figure 13-3. 





JL-1 


F 


1 1 1 
-J 1 1 










X 



7T2 



FIG. 13-3 Multicollinearity. 
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This means that all the observed points in our scatter lie in the vertical plane 
running up through L. You can think of three-dimensional space as a room 
in a house; the observations are not scattered throughout this room, but 
instead lie embedded in an extremely thin pane of glass standing vertically 
on the floor. 

In explaining 7, multicollinearity makes us lose one dimension. In the 
earlier case of simple regression, our best fit for Y was not a line, but rather 
a point (x, Y); in this multiple regression case our best fit for Fis not a plane, 
but rather the line F. To get F, just fit the least squares line through the 
points on the vertical pane of glass. The problem is identical to the one shown 
in Figure 11-2; in one case a line is fitted on a flat pane of glass, in the 
other case, on a flat piece of paper. This regression line Fis therefore our best 
fit for Y. As long as we stick to the same combination of Xand Z — i.e., so long 
as we confine ourselves to predicting Y values on that pane of glass — no 
special problems 3 arise. We can use the regression F on the glass to predict Y 
in exactly the same way as we did in the simple regression analysis of Chapter 
11. But there is no way to examine how X affects Y. Any attempt to define /?, 
the marginal effect of X on Y (holding Z constant), involves moving off that 
pane of glass, and we have no sample information whatsoever on what the 
world out there looks like. Or, to put it differently, if we try to explain Y with 
a plane — rather than a line F — we find there are any number of planes 
running through F(e.g., tt 1 and 77 2 ) which do an equally good job. Since each 
passes through F, each yields an identical sum of squared deviations; thus 
each provides an equally good fit. This is confirmed in the algebra in the nor- 
mal equations (13-4). When X '\s a linear function of Z (i.e., when x is a linear 
function of z) it may be shown that the last two equations are not independent, 
and cannot be solved uniquely for b and c. 4 

Now let's be less extreme in our assumptions and consider the near- 
limiting case, where z and x are almost on a line, (i.e., where all our observa- 
tions in the room lie very close to a vertical pane of glass). In this case, a 
plane may be fitted to our observations, but the estimating procedure is very 

3 In practice, there would be a problem in getting the regression line F, since computer 
routines typically break down in the face of perfect multicollinearity. 

4 Two equations can usually be solved for two unknowns, but not always. For example, 
suppose that John's age (X) is twice Harry's (Y). Then we can write 

X=2Y 

or 

5X - 107 (13-5) 

Note that these two equations tell us the same thing. We have two equations with two 
unknowns, but they don't generate a unique solution, because they don't give us indepen- 
dent information. The second just restates what the first told us. 
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unstable; it becomes very sensitive to random errors, reflected in large 
variance of th^ estimators b and c. Thus, even though Xmay really affect F, 
its statistical significance can't be established because the standard deviation 
of b is so large. This is analogous to the argument in the simple regression 
case in Sectioji 12-6. 

When the independent variables Zand Z are collinear, or nearly so, it is 
called the proplem of multicollinearity. For prediction purposes, it does not 
hurt provided^ there is no attempt to predict for values of X and Z removed 
from their line! of collinearity. But structural questions cannot be answered — 
the relation 01 Y to either X ox Z cannot be sensibly investigated. 



Example 1 

In our w 
fertilizer measured 
incredibly fooflish 
amount of fer 
in ounces mus 



heat yield example, suppose that Xis (as before) the amount of 
in pounds per acre, and that the statistician makes the 
error of defining another independent variable Z as the 
ilizer measured in ounces per acre. Since any weight measured 
t be sixteen times its measurement in pounds: 



Z = 16JT 



(13-6) 



exactly. Thus ill combinations of Zand Z must fall on this straight line, and 
we have an example of perfect multicollinearity. Now if we try to fit 5 a 
regression plane to the observations of yield and fertilizer given in Table 11-1, 
one possible atiswer would be our original regression given in (11-18): 



But an equally 
into (13-7): 



Y = 32.8 + .068Z + OZ 



(13-7) 



satisfactory solution would follow from substituting (13-6) 
Y = 32.8 + OX + .00425Z 



Another equivalent answer would be to make a partial substitution for X in 
(13-7) as follows: 



(13-8) is a whole 
to L In fact, a 



5 The computer 
suppose the calculations 



Y = 32.8 + .068 [XX + (1 - X)X] 

= 32.8 + .068 [A* + (1 - A)(A)Z] 

Y = 32.8 + .068/X + .00425(1 - A)Z 



(13-8) 



family of planes depending on the arbitrary value assigned 
! these three-dimensional planes are equivalent expressions for 



program would probably "hang up" trying to divide by zero. So we 
are handcrafted. 
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our simple two-dimensional relationship between fertilizer and yield. While 
all give the same correct prediction of 7, no meaning can be attached to 
whatever coefficients of X and Z we may come up with. 

Example 2 

While the previous extreme example may have clarified some of the 
theoretical issues, no statistician would make that sort of error in model 
specification. Instead, more subtle difficulties arise. In economics, for example, 
suppose demand for a group of goods is being related to prices and income, 
with the overall price index being the first independent variable. Suppose 
aggregate income measured in money terms is the second independent 
variable. Since this is real income multiplied by the same price index, the 
problem of multicollinearity may become a serious one. The solution is to 
use real income, rather than money income, as the second independent 
variable. This is a special case of a more general warning: in any multiple 
regression in which price is one independent variable, beware of other 
independent variables measured in prices. 

The problem of multicollinearity may be solved if there happens to be 
prior information about the relation of /3 and y. For example, if it is known 
a priori that 

y = 5/3 (13-9) 

then this information will allow us to uniquely determine the regression plane, 
even in the case of perfect collinearity. This is evident from the geometry of 
Figure 13-3. Given a fixed relation between our two slopes 0 and y) there is 
only one regression plane tt which can be fitted to pass through F. This is 
confirmed algebraically. Using (13-9), our model (13-2) can be written 

Y { = a + fa + 5jfe, + e< (13-10) 
= a + ft Xi + 5Zi) + €i (13-11) 

It is natural to define a new variable 

w { = x 4 + 5z, (13-12) 

Thus (13-11) becomes 

Y, = a + M + *i (13-13) 

and a regression of Y on w will yield estimates a and b. Finally, if we wish an 
estimate of y, it is easily computed using (13-9): 

c = 5b (13-14) 
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Suppose the multiple regression 



Y = a + b 1 X 1 + b 2 X 2 + bj[ t + b,X, 



is fitted to 25 observations of Y and the A^s. The least squares estimates often 
are published in the form, for example: 



= 10.6: 



Y= 10.6: + 28.4^ + 4.0X 2 + 12.7X, + MX A (13-15) 
( Jo = 2.6); (s x = 11.4) (j,= 1.5) (j, = 14.1) (j 4 = .76) 
(/i-4.1)' (f, = 2.5) (/ 8 = 2.6) (/ 4 = -9) (/ 5 =U) 

The bracketed information is used in assessing the reliability of the least 
squares fit, ekher in a confidence interval or hypothesis test. 

The true effect of X x on Y is the unknown population parameter fa 
we estimate i^ with the sample estimator b x . While the unknown fi x is fixed, 
our estimator! b x is a random variable, differing from sample to sample. The 
properties of h x may be established, just as the properties of b were established 
in the previous chapter. Thus b x may be shown to be normal — again provided 
the sample si^e is large, or the error term is normal. b x can also be shown to 
be unbiased, with its mean fi v The magnitude of error involved in estimation 
is reflected in %hc standard deviation of b x which, let us suppose, is estimated 
to be s x = llj.4 as given in the first bracket below equation (13-15), and 
shown in Figure 13-4. When b x is standardized with this estimated standard 
deviation, it will have a t distribution. 

To recapitulate: we don't know fa all we know is that whatever it may 
be, our estimator b x is distributed around it, as shown in Figure 13-4. This 
knowledge of^how closely b x estimates fi x can, of course, be "turned around" 
to infer a 95 percent confidence interval for fi x from our observed sample b x 



P(bi) 



Estimated standard 
deviation of b x = 11.4 




True 01 unknown 
FIG. 13-4 Distribution of the estimator b v 
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as follows: 

p x = b x ± t <025 s 1 

= 28.4 ± 2.09(11.4) (13-16) 
= 28.4 ± 23.8 

[n = 25 is the sample size, k = 5 is the number of parameters already 
estimated in (13-15), and ?. 025 is the critical t value with n — k degrees of 
freedom.] Similar confidence intervals can be constructed for the other /J's. 

If we turn to testing hypotheses, extreme care is necessary to avoid very 
strange conclusions. Suppose it has been concluded on theoretical grounds 
that X x should positively influence Y, and we wish to see if we can statistically 
confirm this relation. This involves a one-tailed test of the null hypothesis, 

Ho>Pi=0 (13-17) 

against the alternative 

ff a :ft>0 (13-18) 

If H 0 is true, b l will be centered on fi x = 0, and there will be only a 5% 
probability of observing a t value exceeding 1.72; this defines our rejection 
region in Figure 13-5a. Our observed t value [2.5 as shown below equation 
(13-15)] falls in this region; hence we reject // 0 , thus confirming (at a 5% 
significance level) that Y is positively related to X v 

The similar t values [also shown for the other estimators below (13-15)] 
can be used for testing the null hypothesis on the other /3 parameters. As we 
see in Figure 13-5/?, the null hypothesis /? 2 = 0 can also be rejected, but a 
similar conclusion is not warranted for /3 3 and /? 4 . We conclude therefore that 




01 - 0 Original values of b\ 

(Null hypothesis) 



0 1 1 72 f values for b± 

= 4 



Do not reject H 0 



Reject H Q 



XX 
h » 8 



Other t values 



(b) 



FIG. 13-5 (a) Test of p v (b) Test of other 



INTERPRETING AN ESTIMATED REGRESSION 



the results are 'statistically significant" for X x and X%; the evidence is that Y 
is related to each. But the results are not statistically significant for X 3 and X 4 . 

As long a$ we confine ourselves to rejecting hypotheses— as with /3 X and 
j3 2 — we won't encounter too much difficulty. But if we accept the null hypoth- 
esis about /? 3 and /? 45 we may run into a lot of trouble of the sort first encoun- 
tered in Chapiter 9. Since this is so important in regression analysis, the 
argument is reviewed for emphasis. 

While it ^s true, for example, that our t coefficient for X% (.9) is not 
"statistically significant," this does not prove there is no relationship between 
X 3 and Y. It |s easy to see why. Suppose that we have strong theoretical 
grounds for believing that Fis positively related to X z . In (13-15) this belief 
is confirmed: Tis related to X z by a positive coefficient. Thus our statistical 
evidence is consistent with our prior belief (even though it is not as strong as 
we might like it to be). 6 To accept the null hypothesis ^ 3 = 0 and conclude 
that Xz doesn-t afTect 7, would be in direct contradiction to both our prior 
belief and the jStatistical evidence. We would be reversing a prior belief even 
though the statistical evidence weakly confirmed it. It would have been better 
had we not even looked at the evidence. And we note that this remains true 
for any positive / value, although as / becomes smaller, our statistical con- 
firmation becomes weaker. Only if t is zero or negative, do the statistical 
results contradict our prior belief. 

It follows from this, that if we had strong prior grounds for believing 
X% and X^ to be positively related to Y, they should not be dropped from the 
estimating equation (13-15); instead they should be retained, with all the 
pertinent information on their / values. 

It must be emphasized that those who have accepted hypotheses have 
not necessarily erred in this way. But that risk has been run by anyone who 
has mechanically accepted a null hypothesis because the t value was not 
statistically significant. The difficulty is especially acute — as in the case we've 
cited—when the null hypothesis was introduced strictly for convenience 
(because it was simple), and not because there is any reason to believe it in 
the first placej. It becomes less acute if there is some expectation that H 0 is 
true — i.e., if |here are theoretical grounds for concluding that Y and X are 
unrelated. Suppose for illustration that, we expect a priori that H 0 is true; in 
such a case, k weak observed relationship (e.g., t = .6) would be in some 
conflict with pur prior expectation of no relationship. But it is not a serious 
conflict, and easily explained by chance. Hence resolving it in favor of our 
prior expecta ;ion and continuing to use // 0 as a working hypothesis might be 
a reasonable judgment. 



6 Perhaps becat se of too small a sample. Thus 12.7 may be a very accurate description of 
how Y is related to X 3 \ but our t value is not statistically significant because our sample 
is small, and the standard deviation of our estimator (s 3 — 14.1) is large as a consequence. 
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We conclude once again, that classical statistical theory provides in- 
complete grounds for accepting H 0 ; acceptance must be based also on extra- 
statistical judgment, with prior belief playing a key role. 

Prior belief plays a less critical role in the rejection of an hypothesis; but 
it is by no means irrelevant. Suppose, for example that although you believed 
Y to be related to X x , X^, and X 4 , you didn't really expect it to be related to 
X>; someone had just suggested that you "try on" X 2 at a 5 % level of signifi- 
cance. This means that if H 0 (no relation) is true, there is a 5 % chance of 
ringing a false alarm. If this is the only variable "tried on," then this is a risk 
we can live with. However, if many such variables are "tried on" in a multiple 
regression the chance of a false alarm increases dramatically. 7 Of course, this 
risk can be kept small by reducing the level of error for each / test from 5 
to 1 % or less. This has led some authors to suggest a 1 % level of significance 
with the variables just being "tried on," and a 5% level of significance with 
the other variables expected to affect Y. Using this criterion we would 
conclude that the relation of Y and X x is statistically significant; but the 
relation of Y to X z is not — despite its higher t value — because there are no 
prior grounds for believing it. 8 

To sum up: hypothesis tests require 

L Good judgment, and good prior theoretical understanding of the 
model being tested; 

2. An understanding of the assumptions and limitations of the statistical 
techniques. 

PROBLEMS 

13-4 Suppose a multiple regression of Y on three independent variables 
yields the following estimate, based on a sample of n = 30: 





Y = 25.1 


+ \.2X l 


+ l.ojr, 


- 0.50^ 


Standard deviations 


(2.1) 


(1.5) 


(1-3) 


(.060) 


/-values 


(11.9) 


( ) 


( ) 


( ) 


95% confidence limits 


(±4.3) 


( ) 


( ) 


( ) 



7 Suppose, for simplicity, that the t tests for the significance of the several variables (say k 
of them) were independent. Then the probability of no error at all is (.95)*. For k = 10, 
for example, this is .60, making the probability of some error (some false alarm) as high 
as .40. 

8 Anyone who thinks he would never wish to use such a double standard might suppose that 
Y is the U.S. price level, X x is U.S. wages, and X 2 the number of rabbits in South Australia. 
With the t values shown in equation (13-15), what would he do? 
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(a) Fill in the blank spaces in the above estimate. 

(b) The following are either true or false. If false, correct. 

(1) The coefficient of X x is estimated to be 1.2. Other scientists 
might collect other samples and calculate other estimates. The 
distribution of these estimates would be centered around the true 
value of 1.2. Therefore the estimator is called unbiased. 

(2) If there were strong prior reasons for believing that X x does not 
influence Y, it is reasonable to reject the null hypothesis fi x = 0 at 
the p% level of significance. . 

(3) If there were strong prior reasons for believing that X 2 does 
influence 7, it is reasonable to use the estimated coefficient 1.0 
rather than accept the null hypothesis /? 2 = 0. 



13-6 DUMMY VARIABLES 

There are two major categories of statistical information: cross section 
and time series. For example, econometricians estimating the consumption 
function 9 sometimes use a detailed breakdown of the consumption of in- 
dividuals at various income levels at one point in time (cross section); 
sometimes they examine how total consumption is related to national 
income over i number of time periods (time series); and sometimes they use 
a combination of the two. In this section we develop a method that is especially 
useful in analysing time series data; as we shall see, it also has important 



applications ih cross-section studies as well 



(a) Introductory Example 

Suppose toe wish to investigate how the public purchase of government 
bonds (B) is related to national income (Y). A hypothetical scatter of annual 
observations of these two variables is shown for Canada in Figure 13-6, and 
in Table 13-3j. It is immediately evident that the relationship of bonds to 
income follows two distinct patterns — one applying in wartime (1940-5), 
the other in peacetime. 

The nornjial relation of B to Y (say L x ) is subject to an upward shift (L 2 ) 
during wartime; heavy bond purchases in those years is explained not by Y 
alone, but also by the patriotic wartime campaign to induce public bond 
purchases. B therefore should be related to Y and another variable— war (W). 
But this is on y a categorical, or indicator variable. It does not have a whole 

9 i.e., how consumption expenditures are related to income. 
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6 12 
Y($ billions) 

FIG. 13-6 Hypothetical scatter of public purchases of bonds (B) and national income ( Y). 

range of values, but only two: on the one hand, we arbitrarily set its value 
at 1 for all wartime years; on the other hand we set its value at 0 for all 
peacetime years. Since Wis either "on" or "off," it is referred to as a "coun- 
ter" or "dummy" variable. Our model is: 

B=aL + pY+yW+e (13-19) 

where 

W — 1 for wartime years, 

— 0 for peacetime years. 

This single equation is seen to be equivalent to the following two equations: 

B = K + pY+y + e for wartime (13-20) 

B = a + @Y + e for peacetime (13-21) 

W may also be called a "switching" variable. With war and peace, we 
switch back and forth between (13-20) and (13-21). 

We note that y represents the effect of wartime on bond sales; and /S 
represents the effect of income changes. (The latter is assumed to remain the 
same in war or peace.) The important point to note is that one multiple 
regression of B on Y and W as in (13-19) will yield the two estimated lines 
shown in Figure 13-6; L x is the estimate of the peacetime function (13-21), and 
L 2 is the estimate of the wartime function (13-20). 

Complete calculations for our example are set out in Table 13-3, and the 
procedure is interpreted in Figure 13-7. Since all observations are W = 0, 
or W = 1, the scatter is spread only in the two vertical planes tt x and 7r 2 . 
Estimation involves a multiple (least squares) regression fit of (13-19) to this 



Year 



Table 13-3 Calculations for Regression of B on Y and W, where W is a Dummy Variable. 

B y W y = Y — 7 w = fV- W yw By B>v 



1933 
1934 
1935 
1936 



w 

1938 
1939 
/1940 
3 1941 
E5 1 1942 
•S 1943 
I 1 1944 
v 1945 
1946 
1947 
1948 
1949 



2.6 

3.0 
3.6 
3.7 



2.4 
2.8 
3.1 
3.4 



0 
0 
0 

0 



-4.44 
-4.04 
-3.74 
-3.44 



-T8~ 
4.1 
4.4 
7.1 
8.0 
8.9 
9.7 

10.2 

10.1 
7.9 
8.7 
9.1 

10.1 



-3^" 
4.0 
4.2 
5.1 
6.3 
8.1 
8.8 
9.6 
9.7 
9.6 
10.4 
12.0 
12.9 



0 
0 



-2.84 
-2.64 
-1.74 
-.54 
1.26 
1.96 
2.76 
2.86 
2.76 
3.56 
5.16 
6.06 



-.35 
-.35 
-.35 
-.35 
-—.-35- 



1.55 
1.41 
1,31 
1.20 
~W3- 



-.35 
-.35 
.65 
.65 
.65 
.65 
.65 
.65 
-.35 
-.35 
-.35 
-.35 



0.99 
0.92 
-1.13 
-.35 
.82 
1.27 
1.79 
1.86 
-.97 
-1.25 
-1.81 
-2.12 



-11.54 
-12.12 
-13.46 
-12.73 
~M^7- 
-11.64 
-11.62 
-12.35 
-4.32 
11.21 
19.01 
28.15 
28.89 
21.80 
30.97 
46.96 
61.21 



-.91 
-1.05 
-1.26 
-1.29 
_L33- 
-1.43 
-1.54 
4.62 
5.20 
5.78 
6.30 
6.63 
6.56 
-2.77 
-3.05 
-3.19 
-3.53 



19.71 
16.32 
13.99 
11.83 
-X64- 



.12 
,12 
.12 
.12 
.IX. 



8.07 
6.97 
3.03 
.29 
1.59 
3.84 
7.62 
8.18 
7.62 
12.67 
26.63 
36,72 



.12 
.12 
.42 
.42 
.42 
.42 
.42 
.42 
.12 
.12 
.12 
.12 



£5= 115 2r= 116.3 ZW = 6 



Svw ZBy YiBw 2/ 2w 2 
== 6.52 = 147.25 = 13.74 = 193.72 = 3.84 



a 17 
= 6.76 



17 
= 6.84 



17 

= ,35 



^ . ■ • m ax&Bv =62/ 2 + c2^ 

Estimating equations (13-4) \£Bw = bY,yw + c% >v 3 

/ 147.25 = 6193.72 + c6.52 
or [ 13.74 = 66.52 + c3.84 

\b = .68 
Solution: | c = 2<43 

Thus our estimated regression is: B = 6.76 + .68^ + 2.43 w _ 
Or, expressed in terms of the original variables: B = 6.76 4- ,68(T - Y ) + 2.43(^ - W) 
F 8 = 6.76 + .68(7 - 6.84) + 2.43(0" - .35) 



B = 1.26 + .687+ 2.43Jf 




FIG. 13-7 Multiple regression with a dummy variable (W). 

scatter. The resulting fitted plane 

B = a + bY + cW (13-22) 

can be visualized as a plane resting on its two supporting buttresses tt x and tt 2 
The slopes of L x and L % are (by assumption) equal 10 to the common value 
b, and c is the estimated wartime shift. 

10 This restriction means that L x and L % are not independently fitted. In other words, our 
least squares plane (13-22) is fitted first; L x and L 2 are simply "read off" this plane. Thus 
L x does not represent a least squares fit to the left-hand scatter, nor does L 2 represent a 
least squares fit to the right-hand scatter, 

Thus the dummy variable method of fitting a single multiple regression plane and then 
reading off L Y and L 2 , can be compared to the alternative method of independently fitting 
two simple regression lines to the two scatters in Figure 13-7. Our model would be: 

B = a-L + 0 X Y + e x for wartime 
B — oc 2 + V + e 2 for peacetime, 

and the estimated slopes {fi x and 0 2 ) would generally not be the same. 

(cont'd) 
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In a dummy variable model — as in any regression problem — it is im- 
portant to understand why both variables Y and PKmust be included. Even 
if our only interest is in B and Y, their relationship cannot be properly 
estimated unless Wis taken into account. In other words, since experimental 
control over the "nuisance" variable W is not possible, its effects must 
explicitly be rejmoved in the regression analysis. To ignore this variable is to 
invite a bias in our estimators, as well as an increased variance. To see how 
bias occurs, consider what happens if W is ignored, so that our scatter 
involves only the two dimensions B and Y. Geometrically this involves 
projecting the jthree-dimensional scatter in Figure 13-7 onto the two-dimen- 
sional B- Y plane, as in Figure 13-8a. This is immediately recognized as the 
same scatter pjlotted in Figure 13-6; we also reproduce from that diagram 
L x and L 2 , ouf estimated multiple regression using fFas a dummy variable. 
If we calculate!^, the simple regression of B on 7, it clearly has too great a 
slope. This upward bias is due to the fact that war years tended to be high 
income years : jthus on the right-hand side of this scatter, higher bond sales 
that should be ^attributed in part to wartime would be (erroneously) attributed 
to income alone. 

A similar Jerror is to be expected in any investigation of B and W which 
ignores Y. Wjith no Y dimension, our scatter in Figure 13-7 would be 
projected onto the B-W plane, as in Figure 13-86. In this diagram the only 
way to estimate the wartime effect is to look at the difference in sample 
means, 11 which is too large. This upward bias would be due to the same cause: 
higher bond s^les that should be attributed in part to higher income would 
be (erroneously) attributed to wartime alone. 

This example has illustrated the general nature of dummy variables. 
This can be applied to a wide variety of problems, but one of the most 
useful applications is in removing seasonal shifts in time series data, as 
explained next. 



Estimates ofjfour parameters are required for this model, rather than the three in the 
dummy variable model (13-19); thus one advantage of the dummy model is that it conserves 
one extra degree of freedom. The disadvantage of the dummy model is that it requires an 
additional prior *estriction — that the two slopes are equal. But this is not always a dis- 
advantage. For instance, in our example it may be better to assume the two slopes equal 
than to independently fit a wartime function to only five observations. The very small 
wartime sample may yield a very unreliable estimate of slope, and it may make better sense 
to pool all the ds ta to estimate one slope coefficient, 

11 This is equivalent to a simple regression of B on W. Because of the peculiar scatter 
involved, this regression line would pass through these two means; thus their difference 
represents the effect of W on B, 




Average B (wartime)- 



-Average B (peacetime) 



X I Estimate of y, the effect of wartime on B. 
MCompare this biased estimate (3.45) with the 
I unbiased estimate (2.43) in L 2 in part (a)] 



W 



(b) 



FIG. 13-8 Error when one explanatory variable is ignored, (a) Biased estimate of slope 
(the effect of Y) because the categorical variable W is ignored, (b) Biased estimate of the 
effect of W because the numerical variable Y is ignored. 



(b) Seasonal Adjustment 

To illustrate, consider a spectacular example from real life. Suppose we 
wish to examine how department store sales of jewelry increase over time. 
When we plot quarterly sales (in Table 13-4) against time as in Figure 13-9a, 
we note how sales shoot up every fourth quarter because of Christmas. 
Since we are interested in the long-term secular increase in sales, these 
strange Christmas observations should be discounted. This calls for a dummy 
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Table 13-4 Canadian Jewelry Sales (5) 
and Seasonal Dummies 



Years) 


($100,000's) 


Qx 




G 2 




24 


0 


0 


0 


\l 


29 


0 


0 


1 






o 


1 


o 


4 




1 


n 




(5 


24 


0 


0 


0 


6 


30 


0 


0 


1 


7 


29 


o 


1 


o 




51 


1 


o 


o 


/Q 




n 


ft 


n 

u 


(» 


29 


0 


0 


1 




30 


0 


1 


0 




52 


1 


0 


0 




25 


0 


0 


0 


§ 


30 


0 


0 


1 




29 


0 


1 


0 




50 


1 


0 


0 



Source: Dominion Bureau of Statistics, Ottawa. 
variable 12 Q 4 (for fourth quarter) so that our model is 



(13-23) 



Even this model may not be adequate. Tf allowance should also be made for 
shifts in the other quarters, dummies Q 2 and £> 3 should be added. A dummy 

12 There are three points in the analysis at which we might conclude that explicit account 
should be taken of seasonal swings. We may expect a strong seasonal influence from prior 
theoretical reasoning. Or, such an influence may be discovered after we plot the scatter. 
Finally, it may be discovered by examining residuals after the regression is fitted. Clearly 
those observations indicated by arrows (in Figure 13-9a) have consistently high residuals. 
To explain this, we look for something they have in common. Their common property is 
that they all occi r in the fourth quarter. Hence the fourth quarter is introduced as a dummy 
regressor. This technique of "squeezing the residuals till they talk" is important in every 
kind of regression, not just time series; used with discretion, it indicates which further 
regressors may be introduced in order to reduce bias and residual variance. 
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\ Effect of T alone on S 



S = 24.2 + .07 5T + 25.5Q 4 
h + 4.4Q 3 + 4.7 Q 2 



(unbiased slope = .075) 



1957 



1958 



195$ 
(b) 



1960 



FIG. 13-9 Secular growth in Canadian jewelry sales, with and without seasonal ad- 
justment, (a) Inadequate simple regression of S on r alone, (b) Multiple regression of S on 
T, including seasonal adjustment. 



Q Y is not needed for the first quarter, because Q 2 , g 3 , and g 4 measure the 
shift from a first quarter base. (Whether or not to include the various 
regressors g 4 , Q z , Q 2 , can be decided on statistical grounds, by testing for 
statistical significance. It is common to include them all in such a test, and 
reject or accept them as a group. But such a statistical test on data as extreme 
as ours would be superfluous.) Our modified model is now 

S = a + [\T + (/3 4 g 4 + ftft + kQz) + e (13-24) 

The least squares fit 13 is graphed in Figure 13-96. Notice that our 
seasonal adjustment is exactly the same every year, i.e., each year there is 

13 The least squares fit to this model was calculated by a method similar to that of Table 
13-3. Equation system (13-4) was extended to a system of 5 estimating equations for the 
5 unknowns. 
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the same upward shift (b 2 ) in our fit between the first and second quarters. 
(These seasonal shift coefficients need not always be positive, as in our 
example.) 

By contrast, the simple regression of S on T without quarterly adjustment 
is graphed in Figure \3-9a. Tt is a poor fit, with large residual variance. Even 
worse, the ca culated slope showing the relation of S to T is biased, for the 
same reasons as in the bond example of part (a). 



(c) Seasonal Adjustment without Dummies (Moving Average) 

Dummy variables are not the only means of seasonally adjusting data. 
Another comrnon method is to take a moving average (over a whole year) of 
the time series, as shown in Table 13-5. Note how the wild seasonal swing at 
Christmas is iijoned out in this averaging process. The desired relation of sales 
to time can now be estimated by a simple regression of seasonally adjusted 
S' on T. 

It is interesting to compare this method with the dummy variable 
alternative. Ap apparent disadvantage is that a total of three observations 
are lost at the beginning and end of the time series, in order to get the 
moving average started and finished. However, although it is less evident, 
the same lossjis involved in using dummy variables, since three degrees of 
freedom are lqst in estimating the shift coefficients f} 2 , and /3 4 . 

An advartage of the moving average method is that it is not necessary 
to assume a (jonstant seasonal shift; thus the adjustment for any quarter 

Table 13-5 Moving Average 



Time S (Unadjusted) 



S' (Adjusted by Four 
Quarter Moving 
Average) 



1 
2 

1957 3 
4 
1 
2 
3 
4 



1958 




i(24 + 29 + 29 + 50) = 33 
i(29 + 29 + 50 + 24) = 33 

- 33.25 

- 33.25 
= 33.5 



t 
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varies from year to year. The advantage of dummy variables is that both 
seasonal shifts and the relation of S to Tare estimated simultaneously in the 
same regression. (A moving average adjustment is only the first stage in a 
two-step process; only after it is completed can S f be regressed on T.) 
Another advantage is that the dummy coefficients (/3 2 , and /? 4 ) give an 
index of the average seasonal shift, and tests of significance on them can 
easily be undertaken using standard procedures. 

PROBLEMS 

13-5 Referring to the jewelry sales in Figure 13-9, predict the sales S for the 
next quarter (T — 17, the first quarter of 1961) 

(a) Using the simple regression of S on T alone; 

(b) Using the multiple regression of S on T, including seasonal adjust- 
ment. Is this any better than (a)? 

13-6 Referring to the two years of jewelry sales in Table 13-5, 

(a) Compute the simple regression of S f (adjusted) on T; 

(b) Compute the simple regression of S (unadjusted) on T; 

(c) Of the 2 slopes in (a) and (b), 

(1) Which do you think better shows the time trend of sales? 

(2) Which agrees more closely with the slope b x — .075 estimated 
by using seasonal dummies? 

13-7 Referring to the jewelry sales in Table 13-4, consider the eight quarters 
from the 4th to the 11th quarter. Supposing this were the only data 
available: 

(a) Fit a simple regression line of S on T, without quarterly adjustment; 

(b) Is your slope estimate (time trend) unbiased? Why? 

13-8 Referring to Figures 13-6 and 13-8a, suppose the last 4 years are 
missing. If a simple regression of B on Y is calculated (ignoring W), 
will the bias of the slope be less or greater than before (when all the 
years were used)? Why? 

13-7 REGRESSION, ANALYSIS OF VARIANCE, AND 
ANALYSIS OF COVARIANCE 

(a) Regression with Dummies Equivalent to Analysis of Variance or 
Analysis of Covariance 

If all the independent variables are categorical (dummy) variables, then 
regression analysis is essentially the familiar analysis of variance (ANOVA). 
This can be proved in general; but it is more instructive to illustrate it in 



REGRESSION, ANALYSIS OF VARIANCE AND COVARIANCE 279 



the simplest cise of one independent dummy variable. In Problem 10-3 we 
applied analysis of variance to the problem of whether the income (F) of 
men and won: .en differs. Dummy regression could alternatively have been 
used, with a model of the form: 



where 



The data 



Y = a + 0G + e 

G = 0 for men 
— 1 for women. 



(13-25) 



is analyzed in Table 13-6. We find the same value (b = — 8) 
for the difference in groups that we found in Problem 10-3 (F t — ■ F 2 = —8), 
Note also that' in both tests the residual variance (48) is the same; so is the 
standard error* of estimate (V48 Vl/2). Hence the two procedures are seen 
to be identical. 

We referred to our earlier example of explaining bond sales, as a regres- 
sion on a numerical variable (income) and a dummy variable (wartime). 
This could alternatively be described as a combination of standard regression 
analysis and analysis of variance. Technically, this combination is referred 
to as analysis of covariance (ANOCOVA), although this term is often 
reserved for cs.ses in which the effect of the dummy variable (wartime) is 
of prime interest and the other variable (income) is explicitly introduced 
only to remov$ its noise effects (i.e., to prevent the sort of error shown in 
Figure 13-86). 1 

Another application of the analysis of covariance might be a study of 
the effects of racial discrimination on income; here the major concern would 
be the effect on income of the dummy variable (negro versus white), with a 
simultaneous regression on other numerical variables (years of experience, 
education, etch simply a means of keeping these other influences from 
biasing the result 



(b) Summary 

Multiple regression is an extremely useful tool with many broad applica- 
tions. We define three special cases, distinguished by the nature of the 
independent variables: 

1. "Standard regression" is regression on only numerical variables. 

2. ANOVA is equivalent to regression on only categorical (dummy) 
variables. 

3. ANOCdVA (analysis of covariance) is regression on both categorical 
and numerical variables. 



Table 13-6 Regression Using Only a Categorical Variable, Being then Equivalent to the Analysis of Variance 



Y G g = G-G Yg g 2 Y — 56 — 8^ Y — Y (Y - Yf 



60 


0 


-1/2 


-30 


1/4 


60 


0 


0 


70 


0 


-1/2 


-35 


1/4 


60 


10 


100 


62 


0 


-1/2 


-31 


1/4 


60 


2 


4 


48 


0 


-1/2 


-24 


1/4 


60 


-12 


144 


48 


1 


1/2 


24 


1/4 


52 


-4 


16 


56 


1 


1/2 


28 


1/4 


52- 


4 


16 


50 


1 


1/2 


25 


1/4 


52 


-2 


4 


54 


1 


1/2 


27 


1/4 


52 


2 


4 


F - 56 = a 


G=4/8 — 1/2 




I Yg = -16 


2^ 2 = 2 






2 (Y- Y) 2 =288 



^ ^2 ' (This is residual variance, also appearing 

~ in solution table in Problem 10-3) 



t = — == - _ = - 1.63 (13-26) 

which is less extreme than the critical / value 
of —2.45, therefore not statistically significant. 
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These three techniques are compared using the hypothetical data of Figures 
13-10 to 1 3- 13, which show the possible ways that mortality may be analyzed. 

The hypothetical data in Figure 13-10 shows a sample of observations 
of the mortality of American men. Applying standard regression, we would 
reject the hypothesis that the true slope ft = 0; thus we conclude that age 
does affect the jmortality rate. In the process we derive a useful estimate h, 
of how age affects mortality. 

If the data. is collected into three groups, we come up with the scatter 
shown in Figure 13-11. Note that this is exactly the same set of mortality 
(Y) observations as in Figure 13-10. The only difference is that we are no 
longer as specific about the age (X) variable. Now ANOVA can be applied 14 
to this data to test whether the population means of these three scatters are 
equal. Once again, the conclusion is that age affects mortality. However, 
ANOVA does riot tell us how age affects mortality, unless we extend it to 
multiple comparisons. Moreover, multiple comparisons will yield a whole 
complicated table, whereas standard regression provides a single descriptive 
number (b) showing how age affects mortality. 

So long as A is numerical, as in Figures 13-10 and 13-11, we conclude 
that standard repression is generally the preferred technique. But when X 
is categorical, it cannot be applied. For example, in Figure 13-12 we graph 
how mortality depends on nationality; 15 our X variable ranges over various 
categories (American, British, etc.) and there is no natural way of placing 
these on a numerical scale — or even ordering them. Hence standard regression 
is out of the question, 16 and ANOVA must be used. 

If mortality Is dependent on income as well as nationality, the analysis 
of covariance sh^wn in Figure 13-13 is appropriate. This uses nationality 
dummies, with thje numerical variable income explicitly introduced to elimi- 
nate the error that it might otherwise cause. We confirm that this has greatly 
improved our analysis. Whereas it appeared in Figure 13-12 that a national 
characteristic of the British was a lower mortality rate than the Chinese, we 
see in Figure 13-B that it is not so simple. The height of the fitted planes for 
China and the United Kingdom are practically the same. The lower U.K. 
mortality rate is explained solely by higher income. 



14 Standard regressionj could also be applied, with a line fitted to the scatter in Figure 13-1 1. 
However, if this technique is to be applied, it is more efficient to use the ungrouped data 
of Figure 13-10. 

15 In Figures 13-12 ana 13-13, all samples are assumed drawn from a single age group; we 
consider only other factors influencing mortality. 

16 To confirm, the student will note that a standard linear regression line fitted to the 
scatter in Figure 13-12|will yield b & 0 and the conclusion that nationality does not matter. 
Yet if China is graphed last, rather than first, b r& 0 and it would be concluded that 
nationality does matter. Thus, the conclusion depends on the arbitrary ordering of our 
nationality variable. 




J_ 



25 



35 



FIG. 13-10 



30 
Age* 

'Standard regression," since X is numerical. 



25 
Younger 



30 
Middle 



35 

Older 



FIG, 13-11 



Age group 

X is grouped into classifications, and ANOVA may be used. 



China U.S. U.K. U.S.S.R. 
Nationality X 

FIG. 13-12 X is categorical, and ANOVA must be used. 
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Per 
:apita 
ncome Z 



China U.S. 



U.K. U.S.S.R. 



FIG. 13-13 Arjalysis of covariance for a categorical variable (nationality) and numerical 

variable (income). 

In summary, standard regression is the more powerful tool whenever 
the independent variable X is numerical and the dependence of Y on Xcan 
be described 3y a simple function. Analysis of variance is appropriate if the 
independent variable is a set of unordered categories. 



PROBLEMS 

13-9 Construct a confidence interval for /? using the data in Table 13-6. 
Compare this with the answer to Problem 10-36. 
* 13-10 Usin j the data in Problem 10-2, estimate the regression of yield on 
fertilizer type, using two dummies. Compare with your answer to 
Problem 10-2. 

13-11 The following is the result of a test of gas consumption on a sample 
of 6 cars 

Miles Per Gallon Engine Horsepower 



Make A 



Make B 



21 
18 
15 

20 
18 
15 



210 
240 
310 

220 
260 
320 



(a) Determine the difference in the performance (miles per gallon) 
of the two makes, allowing for horsepower differences. 

(b) Graph your results as in Figure 13-13, 
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(13-12) (a) Based on the following sample information, use the analysis of 
covariance to describe how education is related to father's income 
and place of residence, 
(b) Graph your results. 

Years of Formal Father's Income 



Education (E) (F) 



Urban Sample 15 $8,000 

18 11,000 

12 9,000 

16 12,000 

Rural Sample 13 $5,000 

10 3,000 

11 6,000 
14 10,000 



chapter 1 4 

Correlation 



14-1 SIMPLE CORRELATION 



Simple regression analysis showed us how variables are linearly related; 
correlation analysis will show us the degree to which variables are linearly 
related. In regression analysis, a whole mathematical function is estimated 
(the regression equation); but correlation analysis yields only one number— 
an index designed to give an immediate picture of how closely two variables 
move together In correlation analysis, we need not worry about cause and 
effect relations. Correlation between X and Y can be estimated regardless 
of whether: (a) A" affects Y, or vice versa; (b) both affect each other; or (c) 
neither directly affects the other, but they move together because some third 
variable influences both. Although correlation is a less powerful technique 
than regression, the two are so closely related mathematically that correlation 
often becomes a useful aid in regression analysis. 



(a) The Population Correlation Coefficient p (rho) 

In equation (5-22) we have already defined a useful index of how two 
random variables move together: cr xr , the covariance of X and Y. The 
variables used there were deviations from the mean: 



(14-1) 



It will be useful to express these deviations in terms of fully standardized 
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units; i.e., define the new variables: 

Y-fiy 



(14-2) 



Oy 



Correlation p xr is similar to covariance a XY in (5-22), the only difference 
being that the variables in (14-2) replace those in (14-1). Thus 



Population correlatioi 


1 







(14-3) 



This will be interpreted more fully in Section 14- 1(c) below; for now we turn 
our attention to r, the sample correlation coefficient used to estimate this 
(generally) unknown population p. 



(b) The Sample Correlation Coefficient r 

By analogy with (14-3) 



Sample correlation 

a i a [ Xi - * \ ( Yi - y \ 



(14-4) 



Now consider an intuitive development of this index; (because of the simi- 
larity of the two concepts, some of this interpretation will closely parallel 
the development of covariance in Chapter 5-3). As our example, we use the 
marks on a verbal ( Y) and mathematical (X) test scored by a sample of eight 
college students. Each student's performance is represented by a dot on the 
scatter shown in Figure 14-la; this information is set out in the first two 
columns of Table 14-1. 

Since we are after a single measure of how closely these variables are 
related, our index should be independent of our choice of origin. So we shift 
both axes in Figure 14-16, with both x and y now defined as deviations from 
the mean; i.e., 

x — X — X and y = Y - Y (14-5) 



Values of the translated variables are shown in columns 3 and 4 of Table 14-1. 



Y 



60 - 
= 50 - 
40 - 



20 



{a \ I.I. I 



20 



_L 



y=5o 



40 X = 60 80 
y^Y-Y 



-X 



(b) 



P2 
• 


20 Pi 
-10 # 


• 


• 


• 

• 




x= 


60 



x = X-X 



Y = 50 



2 s Y 



1 • 



rl je= 



(c) 



x=eo 

FIG. 14- i Scatter of math and verbal scores, (a) Original observations, (b) Shift axes, 
(c) Change scale of axes to standard units. 
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Table 14-1 Math Score (X) and Corresponding Verbal Score ( Y) of a Sample of Eight Students Entering College. 



(1) 


(2) 


(3) 


(4) 
.V = 


(5) 


(6) 




(7) 


(8) 

A 

Y = 


(9) 


(10) 


(H) 
1 = 


(12) 


(13) 


X 


K 


X- X 


K — F 


■'7/ 


y 2 




.'/ 3 


50 + b.r 


A 

Y — Y 


( K — f) 2 


60 + M 


X~X 




36 
80 
50 
58 
72 
60 
56 
68 


35 
65 
60 
39 
48 
44 
48 
61 


-24 
20 
-10 
-2 
12 
0 
-4 
8 


-15 
15 
10 
-11 
-2 
-6 
-2 
11 


360 
300 
-100 
22 
-24 
0 
8 

88 


576 
400 
100 
4 
144 
0 
16 
64 




225 
225 
100 
121 

4 
36 

4 
121 


37.96 
60.03 
44.99 
49.00 
56.02 
50.00 
47.99 
54.01 


-2.96 
4.97 
15.01 
-10.00 
-8.02 
-6.00 
.01 
6.99 


8.76 
24.70 
225.30 
100.00 
64.32 
36.00 

.0001 
48.86 


48.27 
71.73 
67.82 
51.39 
58.44 
55.31 
58.44 
68.61 


-12.27 
8.27 
-17.82 
6.61 
13.56 
4.69 
-2.44 
-.61 


150.55 
68.39 
317.55 
43.69 
183.87 
22.00 
5.95 
.37 


2 X 
= 480 


X y 

= 400 


E j- = 0 


Sy = 0 


= 654 


= 1304 




< if 1 
= 836 






= 508 






= 792.37 


X = 60 


F= 50 










4 


« - 1 






v ( K - vy 2 

s = — — 

n — 2 














= .5015 


_ 1304 




_ 836 
7 






508 
~ ~6~ 






_ _ 










S ay 
Zd y 
= .782 


= 186.3 




= 119.4 






= 84.7 






= 132 










jjc = 13.65 




= 10.93 






5 = 9.20 






= H.5 
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Suppose we multiply the x and y coordinate values for each student, 
and sum them all. This (^xy) gives us a good measure of how math and 
verbal results: tend to move together. Whenever an observation such as P 1 
falls in the first quadrant in Figure 14-16, both its x and y coordinates will 
be positive, a!nd their product xy positive. This also holds true for any ob- 
servation in tne third quadrant, with both coordinates negative. The product 
is negative only for observations such as P 2 in the second or fourth quadrant, 
(one coordinate positive, the other negative). If X and Y move together, 
most observations will fall in the first and third quadrants; consequently 
most products xy will be positive, as will their sum — a reflection of the 
positive relationship between X and Y. But if Xand Y are negatively related, 
(i.e., when one rises the other falls), the original scatter will run downhill 
rather than uphill; most observations will fall in the second and fourth 
quadrants, yielding a negative value for our ^ X V index. We conclude that 
as an index cj>f correlation, ^xy at least carries the right sign. Moreover, 
when there is no relationship between X and Y, and our observations are 
distributed evenly over the four quadrants, positive and negative terms will 
cancel, and this index will be zero. 

There are just two ways that ^ X U can ^ e improved. First it depends on 
the units in vjhich x and y are measured. (Suppose the math test had been 
marked out o" 50 instead of 100; x values and our ^ X V index would be only 
half as large— even though the degree to which verbal and mathematical 
performance s related would not have changed.) This difficulty is avoided 
by measuring both x and y in terms of standard units; i.e., both x and y 
are divided by their observed standard deviations. 



where, of course 



and 



This step is sr 
Our new 



sample size, 
sample of do 



X; = 



Vt = 



X 7 ; - X 
Sy 



(14-6) 



4 = -^2(^-jf) 8 

n — 1 



n - 1 



2(X- Yf 



(14-7) 



own in Figure 14-k\ 

index ^x i y i has only one remaining flaw: it is dependent on 
uppose we observed exactly the same sort of scatter from a 
ble the size; our index would also double, even though the 



(S 
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lr--6l 



(a) 



l3 



[r=JJ 



ft) 



• 



-.V 



r = 0l 



(e) (f) 
FIG. 14-2 Scatter diagrams and their associated correlation coefficients. 



picture of how these variables move together is the same.) To avoid this 
problem we divide by the sample size n — or rather n — 1, the divisor in 
(14-7). This yields the sample correlation coefficient: 



r — 



1 

n - 1 



2 Bib 



(14-8) 



which is recognized to be our definition in (14-4). r may be expressed in 
terms of the original observations (X i9 Y f ), by substituting s x and s Y in 



(14-7) into ( 



Example 



ata 



The d 
coefficient be 
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4-4) and cancelling (n — 1): 



r — 



Y) 
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: (14-9) 



in Table 14-1 are applied to (14-9) to calculate the correlation 
ween the math and verbal scores of our sample of eight students. 



r = 



654 



/( 1304)(836) 



= .62 



(14-10) 



Some idea of how r behaves is given in Figure 14-2; especially note 
diagram b. When there is a perfect linear association, the product of the 
coordinates ijn every case is positive; thus, their sum (and the resulting co- 
efficient of correlation) is as large as possible. The same argument holds for 
the perfect inverse relation of Y and X shown in diagram d. This suggests 
that r has an: upper limit of +1 and a lower limit of —1. (This is proved in 
Section (f) below.) 

Finally qompare diagrams e and /. Our calculation of r in either case is 
zero, becaus^ positive products of the coordinates are offset by negative 
ones. Yet when we examine the two scatters, no relation between X and Y 
is confirmed in e — but a strong relation is evident in / ; in this case a knowledge 
of X will tell us a great deal about Y. A zero value for r therefore does not 
imply "no re ation"; rather, it means "no linear relation." Thus correlation 
is a measure 

relations. This brings us to trie next critical qi 
what can we infer about the underlying population pT 



of linear relation only; it is of no use in describing nonlinear 
is brings us to the next critical question: **In calculating r, 



(c) Inference from r to p 



Before we can draw any statistical inference about p from our sample 
statistic r, wfe must clarify our assumptions about the parent population 
from which cjur sample was drawn. In our example, this would be the math 
and verbal marks scored by all college entrants. 

This population might appear as in Figure 14-3, except that there would, 
of course, be| many more dots in this scatter, each representing another 
student, Tf we subdivide both X and Y into class intervals, the area in our 
diagram will be divided up in a checkerboard pattern. From the relative 



292 



CORRELATION 




> 



Math score X 



FIG. 14-3 Bivariate population scattergram (math and verbal scores). 

frequency (sampling probability) in each of the squares, the histogram in 
Figure 14-4 is constructed. 1 The histogram would have approximately the 
shape of the probability density in Figure 14-5. To conclude: in examining 
a random student, neither his math score X nor his verbal score Y is pre- 
determined; both are random variables. Compare this with our example in 
Chapter 11, where one variable (fertilizer) was predetermined. 

This distribution in Figure 14-5 is called "bivariate normal." This 
means that the conditional distribution of X or of Y is always normal. 
Specifically, if we slice the surface at any value of Y, (say F 0 ), the shape of 
the resulting cross section is normal. Similarly, if we select any X value 
(say X 0 ) and slice the surface in this other direction, the resulting cross section 
is also normal. 

It is worthwhile pausing briefly to consider the alternative way that the 
bivariate normal population shown in three dimensions in Figure 14-5 can 
be graphed in two dimensions. Instead of slicing the surface vertically as we 
did in that diagram, slice it horizontally as in Figure 14-6. The resulting 




X 



FTG. 14-4 Bivariate population histogram. 

1 Our example is of a finite population, but a similar argument would apply for an infinite 
population. Moreover, instead of using heights for probabilities, we could use dots of 
different sizes; see Figure 5-4a. 
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FIG. 14-5 Bivariate normal distribution. 

cross section is an ellipse, representing all X, Y combinations with the same 
probability density. This "isoprobability" curve is marked "c" in the two 
dimensional X, Y space in Figure 14-7; isoprobability ellipses defined when 
this surface is sliced horizontally at higher and lower levels are also shown. 
(Once again, many social scientists will recognize this as the familiar strategy 
of forcing a j three-dimensional function into a two-dimensional space by 
showing one variable as a set of isoquants, isobars, or whatever,) It will also 
be useful in Figure 34-7 to mark the major axis (d) common to all these 
isoprobability ellipses. If the bivariate normal distribution concentrates about 
its major axis, p increases. Several examples of populations, and their 
associated correlation coefficients p are shown in Figure 14-8. 

Provided that the parent population is bivariate normal, inferences 
about the population p can easily be made from a sample correlation r. 
Recall the inferences about tt from P in Chapter 8. Using the same reasoning 
that established Figure 8-4, Figure 14-9 is constructed. Thus from any 
sample r, a 9j5% confidence interval for the population p can be found. For 
example, if ^sample of 25 students has r = .80, the 95 % confidence interval 



FIG 




14-6 An isoprobability ellipse from a bivariate normal surface. 



FIG. 



X 

14-7 The bivariate normal distribution shown as a set of isoprobability ellipses. 
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f 1.0 -.8 -.6 -.4 -.2 0 + .2 +.4 +.6 +.8+1.0 
Sample correlation r 

FIG. 14-9 95j% confidence bands for correlation p in a bivariate normal population, for 
various sample sizes n. (This chart is reproduced with the permission of Professor E. S. 
Pearson from jF. N. David, Tables of the Ordinates and Probability Integral of the Distri- 
bution of the Correlation Coefficient in Small Samples, Cambridge University Press, 1938.) 

for p is reac vertically as 

.58<p<.90 (14-11) 

Because of space limitations, we shall concentrate in the balance of this 
chapter on sample correlations, and ignore the corresponding population 
correlations But each time a sample correlation is introduced, it should be 
recognized mat an equivalent population correlation is defined similarly, and 
inferences may be made about it from the sample correlation. 



PROBLEMS 

14-1 



Son's Height 
(inches) 

68 
66 
72 
73 
66 



Father's Height 
(inches) 

64 
66 
71 
70 
69 
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From the above random sample of 5 son and father heights, find 

(a) The sample correlation r; 

(b) The 95% confidence interval for the population correlation p; 

(c) At the 5% significance level, can you reject the hypothesis that 
P = 0? 

=> 14-2 From the following sample of student grades, 



Student First Test X Second Test Y 



A 80 90 

B 60 70 

C 40 40 

D 30 40 

E 40 60 



(a) Calculate r; and find a 95% confidence interval for p\ 

(b) Calculate the regression of Y on X, and find a 95% confidence 
interval for /?; 

(c) Graph the 5 data points and the estimated regression line; 

(d) At the 5% significance level, can you reject 

(1) The null hypothesis p = 0? 

(2) The null hypothesis p = 0? 



(d) Correlation and Regression 

If regression and correlation analysis were both applied to the same 
scatter of math (X) and verbal (F) scores, how would they be related? 
Specifically, consider the relation between the estimated correlation r, and 
the estimated regression slope b. In Problem ll-4(b) it was confirmed that 

and from (14-9) noting that both x and y are defined as deviations 

T xy 

r - ,-±1 >-= (14-13) 
When (14-12) is divided by (14-13) 



r 2> 2 VJ* 2 



(14-14) 



If we divide I 
by n — 1 



both 



or 
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the numerator and denominator inside the square root sign 



b _ flfl( n - 1) _ sy 
r V 2> 2 /(» - 1) s x 



b = r — 

Sy 



(14-15) 



(14-16) 



This close correspondence of 6 and r will play an important role in the 
argument later. Note that if either r or b is zero, the other will also be zero. 



(e) Explained and Unexplained Variation 



In Figure 14-10 we reproduce our sample of math (X) and verbal (Y) 
scores, along with the fitted regression of Y on X, calculated in a straight- 
forward way from the information set out in Table 14-1. Now, if we wished 
to predict a Student's verbal score (F) without knowing X, then the best 
prediction wokild be the average observed value (Y), At x i9 it is clear from 
this diagram that we would make a very large error—namely (Y t — F), the 
deviation in Y t from its mean. However, once our regression equation has 
been calculated, we predict Y to be f it Note how this reduces our error, 
since (f { — ■ 7)— the large part of our deviation — is now "explained." This 
leaves only a relatively small "unexplained" deviation Y i — f t . The total 



Yi~ y* total 
deviation 



Yi - % = deviation not 
explained by regression 




Yi - Y- deviation 
explained by regression 



FIG. 14-10 The value of regression in reducing variation in Y. 
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deviation of Y is the sum: 

(Y t - F) = (?, — Y) + (Y, - ?, : ), for any i (14-17) 
total deviation = explained deviation + unexplained deviation 
It follows that 

Z (*i - F) = 2 - F) + 2 (Y t - ?<) (14-18) 

What is surprising is that this same equality holds when these deviations are 
squared, i.e. 



(14-19) 



or, total variation = explained variation + unexplained variation 

where variation is defined as the sum of squared deviations. Recall a very 
similar conclusion proved in analysis of variance (10-16); (14-19) can be 
established in much the same way. 2 

Since we may write, according to Problem 11 -4(a), 

(? t - Y) = y i = bx i (14-21) 
it is often convenient to rewrite (14-19) as 

' 2(7,- Yf = b* 2 *f + 2 ( Yi ~ t) 2 04-22) 
total variation = variation explained by X -f unexplained variation 

This equation makes explicit the fact that explained variation is that ac- 
counted for by the estimated regression coefficient b. This procedure of 
decomposing total variation and analyzing its components is called "analysis 
of variance applied to regression." The components of variance are displayed 
in the ANOVA Table 14-2 similar to Table 10-6, (bearing in mind 3 that we 

2 For proof, square both sides of (14-17), and sum over all values of /: 

2 (Y t - Yf = 2 [ih — Y)+ (Y t - £)] 2 

= 2 & - Yf + 2 {Yi ~ £) 2 + 22 (£ - Y)(Y, - Y s ) (14-20) 
The last term can be rewritten using (14-21): 

But this sum vanishes: in fact it was set equal to zero in the normal equation (11-15) used 
to estimate our regression line. Thus the last term in (14-20) disappears, and (14-19) is 
proved. This same theorem can similarly be proved in the general case of multiple regression. 

A further justification of the least squares technique (not mentioned in Chapter 11) 
is that it results in this useful relation between explained, unexplained, and total variation. 

3 And also noting that our terminology for degrees of freedom has changed, e.g., the total 
number of sample observations is now designated simply as n, rather than nr. 



(a) General 
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Source of 1 


/ariation 


Variation 


Degrees of 
Freedom (d.f.) 


Variance 


Explained (by 
Unexplained 


regression) 
^residual) 


or /> 2 2>" 


1 

n - 2 


1 

2 I( y - 

= — 

n — 


- Y,f 
2 


Total 




- Yf 


n - 1 




(b) For Sampl 


; of Verbal and Math Scores (Table 14-1) 


Sources of 


^Variation 


Variation 


Degrees of 
Freedom (d.f.) 


Variance 


F 


Explained (b) 
Unexplained 


r regression) 
(residual) 


328 
508 


1 
6 


328 
84.7 


3.87 


Total 




836 V 


7 V 







are now explaining F, rather than X). From this, a null hypothesis test on 
may be constructed; as before, the question is whether the ratio of the ex- 
plained variarce to unexplained variance is sufficiently greater than 1 to 
reject the hypothesis that Y is unrelated to X. Specifically, a test of the 



hypothesis 
involves forming the ratio 



A 5% 



(14-23) 



F = 



variance explained by regression 
unexplained variance 



(14-24) 



significance test involves finding the critical F value which leaves 5% 
of the distribution in the right-hand tail. If the sample F value calculated 
from (14-24) exceeds this, reject the hypothesis. 

We must |mphasize that this is just an alternate way of testing the null 
hypothesis (14j-23). The first method — using the / distribution to find the 
confidence interval for /? (as in Section 12-7) — is usually preferable. 



300 



CORRELATION 



Note that the Fand t distributions are related, in general, by 

F = t 2 

where there is one degree of freedom in the numerator of F. Since the F 
calculated in (14-24) is just the t 2 of (12-36), the ANOVA F-test of this 
section is justified. 



Example 

In Table 14-2(b) the ANOVA calculations are presented for our verbal 
and math score example. (The necessary computational details are shown on 
the bottom of Table 14-1.) To test /? = 0, (14-24) is evaluated to be: 

F = — = 3.87 (14-25) 
84,7 

Since this falls short of 5.99, the critical 5% point of F, we do not reject the 
null hypothesis. 

The same test of j3 = 0 could be equivalently done using (12-36): 

r = ^L==— » -1.97 
s/V5>? 9.2/V1304 

Since this falls short of 2.45, (the critical value leaving a total of 5% in both 
tails of the t distribution), the null hypothesis is not rejected. Since / 2 = jF, 
(both for the calculated and for the critical values), the same conclusion 
must follow from both tests. 

Alternatively, a 95% confidence interval for ft could be constructed 
from (12-30): 

(3 = .50 ± (2.45) .254 
= .50 ± .62 

This includes the value (i = 0, once more confirming that H 0 cannot be 
rejected. (Of course, this inconclusive result may be partly due to the smallness 
of the sample.) 

(f) Interpretation of Correlation 

These variations in Fare now related to r. It follows from (14-14) that 
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Substituting this value for b in (14-22) 

2 ( Y i - Yf = r 2 1 + J ( K, - f,-) 2 (14-27) 
Noting that £ $r 2 is by definition J — F) 2 , the solution for r 2 is: 

2(y, - F) 2 - 2(y, - y,:) 2 , 



I(y - F) 2 

Finally, we ckn reexpress the numerator by noting (14-19). Thus 



(14-28) 





Yf 


explained variation of Y 




Yf 


total variation of Y 



(14-29) 



This equation provides a clear intuitive interpretation of r 2 . (Note that 
this is the square of the correlation coefficient r, often called the coefficient 
of determination.) It is the proportion of the total variation in Y explained 
by fitting the regression. Since the numerator cannot exceed the denominator, 
the maximuip value of the right-hand side of (14-29) is 1 . Since the maximum 
value of r 2 1, the limits on r are ±1. These two limits were illustrated in 
Figure 14-2 j in part (b), r = 1 and all observations lie on a straight line 
running uphill; in part {d\ r= — 1 and this perfect inverse correlation 
reflects the ijact that all observations lie on a straight line running downhill 
In either cade, a regression fit will explain all the variation in Y. 

When r = 0 (and r 2 = 0) the explained variation of Y is zero and a 
regression hjne explains nothing; i.e., the regression line will be parallel to 
the X-axis, with 6=0. Thus r = 0 and b = 0 are seen to be equivalent ways 
of formally jstating "no observed linear relation between X and y* 



(g) Regress on Analysis Applied to a Bivariate Normal Population 

In Table 14-1 a regression was calculated for sample values assumed 
taken from: a bivariate normal population. We now ask: "Is the A. we cal- 
culated an estimator of a population /?, or does j3 even exist? For a bivariate 
normal population, does there exist a true regression line of Y on XT It 
will now be shown that the answer is yes. 

Our assumed bivariate normal population is shown in Figure 14-11 as 
a set of isoprobability ellipses, with major axis d. Now consider the straight 
line Y = aj + fiX, defined by joining points of vertical tangency such as P 1 
and P z . Each of these vertical tangents defines a cross section slice of Y 
which is normal. Concentrating on the slice through P x Q ly for example, we 
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Math score X 

FIG. 14-11 Two regression lines found from the isoprobability ellipses. 



see that the mean of these Y values occurs at the point of tangency P x \ at 
this point our vertical line touches its highest isoprobability ellipse, and the 
highest point on any normal curve is at the mean. Thus we see that the means 
of the Y populations lie on the straight line Y = a + fiX. Next, the variance 
of the Y populations can be shown to be constant. 4 Thus the assumptions of 
the regression model (12-2) are satisfied by a bivariate normal {correlation) 
population. The line Y = a + fiX may therefore be regarded as a true linear 
regression of Y on X. 

Thus if we know a student's math score and we wish to predict his verbal 
score, this regression line would be appropriate, (e.g., if his math score 
were X u we would predict his verbal score to be P{). It is important to fully 

4 This may seem like a curious conclusion, since in Figure 14-5 the size of each cross section 
slice differs depending on the value of X 0 . However each slice p(X 0 , Y) must be adjusted 
by division by p(X 0 ) in order to define the conditional distribution of Y. Thus recalling 
the argument in Section 5-1 (c), and in particular equation (5-10), the conditional distribu- 
tion is 

./»(*.. 10 



p(rix 0 ) = ' 



p(x n ) 



In fact, this adjustment makes all the conditional distributions of Y "look alike," and 
thus have the same variance. 
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understand why we would not predict Q x \ i.e., we do not use the major axis 
of the ellipse (line d) for prediction, even though this represents "equivalent" 
performance cn the two tests. Since this student is far above average in 
mathematics, an equivalent verbal score seems too optimistic a prediction. 
Recall that there is a large random element involved in performance. There 
are a lot of sti dents who will do well in one exam, but poorly in the other; 
technically, p is less than 1 for this population. Therefore, instead of pre- 
dicting at Iwe are more moderate and predict at P x — a sort of average 5 
of "equivalentj' performance Q x and "average" performance ju r . 

This is the origin of the term regression. Whatever a student's score in 
math, there will be a tendency for his verbal score to "regress" toward 
mediocrity (i.e., the average). 6 It is evident from Figure 14-11 that this is 
equally true for a student with a math score below average; in this case the 
predicted verbal score regresses upward toward the average. 

Another interesting observation is that the correlation coefficient 
between Xand Y is unique (i.e., p XY is identically p YX )\ b ut there are two 
regressions, the regression of Y on X and the regression of X on Y. This is 
immediately evident if we ask how we would predict a student's math score 
(X) if we knew his verbal score (e.g., F <5 ). 

Exactly the same argument holds. Equivalent performance (point Q e on 
line d) is a bad predictor; since he has done very well in the verbal test, we 
would expect Ijim to do less well in math, although still better than average. 
Thus, the best; prediction is P 6 on the line X — a* + Y y the regression 
of X (math) on Y (verbal). This is the direct analogue to our regression of 
Y on X, but in this case our regression is defined by joining points (P§, P 4 , 
etc.) of horizontal, rather than vertical tangency. Each of these horizontal 
tangents defines a normal conditional distribution of X, given Y; each of 
these distributions has the same variance, with its mean lying on this regres- 
sion line thus satisfying our 

hence least squares values a* and are used to estimate a 



5 P x is in fact a weighted average of Q x and n Y> with weights depending on p. Thus in the 
limiting case in which p = 1, A' and Y are perfectly correlated, and we would predict Y 
at Q v At the other limit, in which p = 0, we can learn nothing about likely performance 
on one test from the result of the other, and we would predict Y at /a Y . But for all cases 
between these two limits, we predict using both Q x and /t r ; and the greater the p, the 
more heavily Q x is weighted. 

6 A classical case, encountered by Pearson & Lee (Biometrika, 1903), involved trying to 
predict a son's he ght from his father's height. If the father is a giant the son is likely to be 
tall; but there at J 2 good reasons for expecting him to be shorter than his father. (For 
example, how tal! was his mother? And his grandparents? An so on.) So the prediction 
for the son was derived by "regressing" his father's height towards the population average. 
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Y = a + bx 



x = eo x = 90 

Math score X 



FIG. 14-12 Regressions estimated from a sample of verbal and math scores. 
Example 

Our sample of eight student's scores shown in Figure 14-1 and Table 
14-1 was, by assumption, drawn from a bivariate normal population as 
shown in Figure 14-11. We have already estimated p with 

r = .62 (14-10) repeated 

And from Table 14-1, we estimated Y = a + fiX with 

Y = 50 + .50x (14-30) 

= 20 + .50X (14-31) 

We now estimate X = a* + /?* Y. The coefficients in this simple regression 
of X on Tare calculated in Table 14-1; this involves using the estimating 
equations (11-13) and (1 1-16), taking care to interchange Xand Ythroughout. 
Thus 

X = 60 + .78?/ 

= 60 + .78(7- F) (14-32) 

= 21 + .78 Y (14-33) 

The two estimated regressions (14-31) and (14-33) are shown in Figure 
14-12. Thus, for example, the predicted verbal score of a student with a math 
result of 90 is 65; and the predicted math score of a student with a verbal 
result of 30 is 44.4. 
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(h) When Correlation, When Regression? 



Both the standard regression and correlation models require that 7 be 
a random variable. But the two models differ in the assumptions made 
about X. The: regression model makes few assumptions about X, but the 
more restrictive correlation model of this chapter requires that A" be a random 
variable, having with Y a bivariate normal distribution. We therefore con- 
clude that the standard regression model has wider application. Regression 
may be used 'or example to describe the fertilizer-yield problem in Chapter 
11 where X was fixed, or the bivariate normal population of X and Y in 
this chapter, jrlowever, the standard correlation model describes only the 
latter. (It is true that r 2 can be calculated even when X is fixed as an indication 
of how efTec ively regression reduces variation; but r cannot be used for 
inferences about p in Figure 14-9.) 

In addit on, regression answers more interesting questions. Like cor- 
relation, it indicates if two variables move together; but it also estimates 
how. Moreover, it can be shown that a key issue in correlation analysis — the 
test of the ni 11 hypothesis 



can be answ 
null hypoth 



(14-34) 



;red directly from regression analysis by testing the equivalent 
Hoip = 0 (14-35) 



esis 



Thus rejection of /3 — 0 implies rejection of p = 0, and the conclusion that 
correlation does not exist between X and Y. If this is the only correlation 
question, thpn it can be answered by the regression test of (14-35), and 
there is no rjeed to introduce correlation analysis at all. 

Since regression answers a broader and more interesting set of questions, 
(and some cbrrelation questions as well), it becomes the preferred technique; 
correlation fs useful primarily as an aid to understanding regression, and as 
an auxiliary' tool. 



(i) "Nonsense" Correlations 

In interpreting correlation, one must keep firmly in mind that absolutely 
no claim is made that this necessarily indicates cause and effect. For example, 
suppose that the correlation of teachers' salaries and the consumption of 
liquor over a period of years turns out to be .98. This would not prove that 
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teachers drink; nor would it prove that liquor sales increase teachers' salaries. 
Instead, both variables moved together, because both are influenced by a 
third variable — long-run growth in national income. If only third factors of 
this kind could be kept constant — or their effects fully discounted — then 
correlation would become more meaningful. This is the objective of partial 
correlation in the next section. 

Correlations such as the above are often called "nonsense" correlations. 
It would be more accurate to say that the observed mathematical correlation 
is real enough, but any naive inference of cause and effect is nonsense. 
Moreover, it should be recognized that the same charge can also be leveled at 
the conclusions sometimes drawn from regression analysis. For example, a 
regression applied to teachers' salaries and liquor sales would also yield a 
statistically significant b coefficient. Any inference of cause and effect from 
this would still be nonsense. 

Although correlation and regression cannot be used as proof of cause 
and effect, these techniques are very useful in two ways. First, they may 
provide further confirmation of a relation that theory tells us should exist 
(e.g., prices depend on wages). Second, they are often helpful in suggesting 
causal relations that were not previously suspected. For example, when 
cigarette smoking was found to be highly correlated with lung cancer, 
possible links between the two were investigated further. This included more 
correlation studies in which third factors were more rigidly controlled, as 
well as extra-statistical studies such as experiments with animals, and 
chemical theories. 



PROBLEMS 

14-3 For the following random sample of 5 shoes, find 

(a) The proportion of the variation in Y explained by regression on X. 

(b) The proportion unexplained. 

(c) Whether Y depends on X, at the 5% significance level. Answer this 
in three alternate ways — using the F test, t test, and a 95% confidence 
interval. 



X = Cost of Shoe 



Y = Months of Wear 



10 
15 
10 
20 
20 



S 

10 

6 

12 

9 
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14_4 Suppose a bivariate normal distribution of scores is perfectly symmetric, 
with p = .50 and with isoprobability ellipses as follows: 



True or 



80 




80 



False? If false, correct it, 
j (a) The regression curve of Y on X is 
j y = 80 + .5(X - 80) 

(b) The regression line of Y on X has graph as follows: 

y 



80 



80 

■|i 

(c) The: variance of Y is 1/4 the variance of X. 

(d) The^ proportion of the Y variation explained by X is only 1 /4. 

(e) Thus the residual Y values (after fitting X) would have 3/4 the 
j variance of the original Y values. 

14-5 Let h aijd be the sample regression slopes of Y on X, and X on 7, 
for any;given scatter of points. 
True or False? Tf false, correct it. 



(a) b = 



(b) fc„ fc= r ^ 

(c) bb* = r\ 
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(d) If b > 1, then < J necessarily. 

(e) If b < 1, then b* > 1 necessarily. 

14-6 In the following graph of 4 students' marks find geometrically (without 
doing any algebraic calculations): 

(a) The regression line of Y on X. 

(b) The regression line of Ion Y. 

(c) The correlation r {Hint. Problem 14-5c). 

(d) The predicted K-score of a student with X-score of 70. 

(e) The predicted X-scove of a student with 7-score of 70. 

^80 



40 - 



40 60 80 
Term grade X 



14-2 PARTIAL CORRELATION 

As soon as we move from the simple two- variable case to relations 
which involve more than two variables, complications arise. To illustrate, 
consider a simple three-variable example: suppose that yield of hay (Y) 
depends on spring temperature (X) and rainfall (Z). 

Following the techniques of Chapter 13 we could fit the following 
regression plane to a scatter of observations of Y, X, and Z: 

Y = a + bX + cZ (14-36) 

Recall how we interpreted the multiple regression coefficient: b estimates 
how Y is related to X if Z were constant. The partial correlation coefficient 
r xY.z is a similar concept. It estimates how X and Y move together if Z 
were held constant. (For convenience variables Y 9 X, and Z are often defined 
as variables 1, 2, and 3; thus r Y x.z becomes r 12>3 , the partial correlation of 
the first two variables, when the third is assumed constant.) 

While the previous sections of this chapter correspond to the simple 
regression analysis of Chapter 12, the partial correlation analysis in this 
section corresponds to the multiple regression analysis of Chapter 13. Thus we 
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could embark liere on a whole chapter on partial correlation, and a long 
one at ;hat. However, since we have argued in the previous section that 
correlation is 'relatively less important, we confine ourselves to a brief 
intuitive introcuction to this concept, and how it may be used. 

Th$ following assumptions are generally made about the parent popula- 
tion. The distribution of X, F, and Z is multivariate normal This implies 
that for any value of Z, the conditional distribution of Fand X is bivariate 
normal as shown in Figure 14-5. p YX .z ls defined as the simple correlation 
of this conditional joint distribution of X and F. 

In computing its estimator r YXZ a problem arises. Since Z is a random 
variable it is |simply not possible to fix a single value Z 0 and sample the 
corresponding!conditional distribution of Xand F Thus, unless the sample 
is extremely large, it is unlikely that more than a single F, X, Z 0 combination 
involving Z 0 will be observed. The alternative is to compute r YXZ as ^e 
correlation of F and X after the influence of Z has been removed from each. 7 

Th^ resulting partial correlation r XYZ can > a f* ter considerable manipula- 
tion, be expressed as the simple correlation of Fand X (r YX ), adjusted by 
applying the two simple correlations involving Z (namely r xz and r YZ ) as 
follows: 



This formula 
between the 
special case t 
r xz = tyz — 



f YX.Z 



r vx — r Y z r xz 



xz 



z Vi - r YZ 



(14-40) 



shows explicitly that there need be no close correspondence 
martial and simple correlation coefficient; however, in the 
lat both X and F are completely uncorrelated with Z (i.e., 
0), then (14-40) reduces to: 

r YX .z^ r YX (14-41) 

and, as we wcjmld expect, the partial and simple correlation coefficients are 
the sam|e. 

It is instructive to note what happens at the other extreme when X 
becomes perfectly correlated with Z. In this case r YX z cannot be calculated 



is the siir 



7 By the "influence" of Z on Y we mean the fitted regression of Y on Z: 

Y = a + bZ 



(14-37) 



By "removing the influence," we mean subtracting the fitted from the observed F value, 
obtaining the residual deviation : 

& = Y - F « Y - a - bZ (14-38) 

which is recognized to be that part of Y not explained by Z. Similarly, we obtain v, the 
residual deviation of X from its fitted value on Z. The partial correlation coefficient r XY , z 



pie co 



relation of u and v, thus: 

r XY-Z ~ r uv 



(14-39) 
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since r xz = 1 and the denominator of (14-40) becomes zero as a con- 
sequence. This is recognized as the multicollinearity problem of Chapter 13, 
where the corresponding multiple regression estimate b could not be defined. 

The parallel statistical properties of b and r yxz can be extended 
further: rejection of the hypothesis that /? = 0 in Chapter 13 is equivalent to 
rejecting the null hypothesis that p Y x.z = 0- Again, one reason for empha- 
sizing regression analysis is confirmed: multiple regression will not only 
answer its own set of regression questions, but also partial correlation 
questions as well. 



14-3 MULTIPLE CORRELATION 

A partial correlation coefficient may be computed for each independent 
variable in a multiple regression. In addition, one single overall index of 
value of fitting the multiple regression equation can be defined: the multiple 
correlation coefficient, R, is the simple correlation coefficient of the observed 
Y and the corresponding fitted Y\ Thus, if our estimated regression is: 

?=a + kX+cZ (14-42) 

then 



YY 



(14-43) 



This has all the nice algebraic properties of any simple correlation. In 
particular, we note (14-29) which takes the form 

n 2 2 ~ Yf explained variation of Y t j j ^ 

= = .— mA-44) 

2 ( Y i ~ yf total variation of Y 

Note that this is identical to r 2 if there is only one regressor (independent 
variable). If there is more than one regressor, then the numerator represents 
the variation of Y explained by all of them [with Y estimated from the full 
multiple regression (e.g., (14-42)]. Thus, as we add additional explanatory 
variables to our model, by watching how fast R 2 increases we can immediately 
see in (14-44) how helpful these variables are in improving our explanation 
of Y. Our conclusion is the same as in simple correlation: one of the major 
values of calculating R 2 is to clarify how successfully our regression explains 
the variation in Y. 

It remains, finally, to relate this to our /-test of multiple regression 
coefficients, using our example in (13-15). We could extend (14-22) to 

Total variation = variation explained by (X x • • * X%) + additional 

variation explained by X 4 -j- unexplained variation (14-45) 

We could set this up in an ANOVA table like Table 10-10, and construct the 



ratio 
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(14-46) 



f _ additional variance explained by X^ 
unexplained variance 

just as the ra io (10-30) was constructed. A test of significance of this ob- 
served value of F is thus seen to be a test of the significance of the (last 
included)! regressor X^. Similarly, we could construct an observed F ratio 
for each of the other regressors in turn. These F values are translated into 
t values 8 that appear under equation (13-15), and lend themselves so easily 
to tests or significance on each regressor. 

I 

PROBLEMS 

14-7 F6r the data of Problem 13-1 relating savings S to income Y and 
assets Y, find 

(a) r^tf, the simple correlation of S and 7. 

r s?.w> the partial correlation of S and 7, holding W fixed, 
(c) R, the multiple correlation of S on Y and W. 
(dj Th^ proportion of the variation of S which is explained by 

j (1) 7 alone; 

I (2) I By 7 and W. 

(e) l Comparing (a) and (c), is R larger than r in this problem? Is 
R necessarily larger than r always? 

(f) Jls rl Y or r BY W a better measure of "how S and 7 are related, 
othjer things being equal" ? 

14-8 Repeat -Problem 14-7, using the data of Problem 13-2 and sub- 
stituting; N for W throughout. 

*14-9 FolJowiAg Problem 14-7(d), find 

(a) The proportion of the variation explained by the addition of W 
as a! regressor. 

(b) The | proportion of the variation which is unexplained after 
regression of S on 7 and W. 

(c) kowjmany degrees of freedom are there for the two components 
of variation in (a) and (b)? 

(d) Usinj* parts (a), (b), and (c), calculate the variance ratio F, to 
test the statistical significance of adding W to the regression model. 

(e) (Calculate t = — -J F, This is one way the /-values could be found 
in an equation such as (13-15). 

* 14-10 Repeat tjie steps of Problem 14-9 to find the /-value to test the 
statistical | significance of adding 7 as a regressor after S is regressed 
on W. 



8 Using, of course, t = ± V F } with 1 degree of freedom in the numerator. 
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This chapter is devoted to making decisions in the face of uncertainty. 
A large part of the discussion involves Bayesian methods, which are not only 
useful for their own sake, but also sharpen our understanding of the limita- 
tions of classical statistics. 

15-1 PRIOR AND POSTERIOR DISTRIBUTIONS 

Problem 3-24b on Bayes' theorem is important enough to repeat, in a 
slightly altered form. If we were to predict tomorrow's weather before con- 
sulting a barometer, we would use Table 15-1 : 



Table 15-1 Prior Probabilities 



State 0 


Rain (6) x ) 


Shine (&>) 


Prior probability p(6) 


.40 


.60 



But we can do better, by using a barometer characterized by Table 15-2: 



Table 15-2 Conditional Probabilities pixjd) 



State 6 
Prediction x^-^^^ 


Rain (flj 


Shine (6 2 ) 


"Rain" (xj 
"Shine" (z 2 ) 


.10 


.» 
.80 


S 


1.00 


1.00 
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Table 15-3 Posterior Probabilities, p{0jx) 
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State 6 



Posterior probability 
/?(0/"rain") 



Rain (0 X ) Shine (0 2 ) 



.75 



.25 



After it is 
es of 



the 



is observed that the barometer's prediction is "rain," the 
probabilities of Table 15-1 are no longer relevant, and should be replaced 
by Table 15-3] 

We -ecall that this was derived, diagrammaticaliy, by combining 
Tables 15-1 and 15-2 into Figure 15-1. The new sample space is "rain" with 
the relative size of these two hatched areas explaining the two posterior 
probabilities ii Table 15-3. 

Since, thip is so important, we now write down its full formal con- 
We use (5-10) to express the probability of rain and "rain" as 

piOuxd^piOJ-p&fal (15-1) 
= (.4)09) = .3(5* (15-2) 
3robability of the state shine and the prediction "rain" is 

pW2,3$=p02)-p(*ilOJ (15-3) 
- (.6)02) = .12 (15-4) 

lculations define the hatched areas in Figure 15-1. Comparing 
areas we conclude that it is three times as likely for a "rain" prediction to be 
associate^ with rain, as with shine. Formally, the hatched area in Figure 
15-1 becbmesj the new sample space, within which we calculate the new 
(conditional) probabilities. 
To cjo dips, we note that 

/>("rain") = pfri) = .36 + .12 - .48 (15-5) 

in- 
state $ 

Rain (.4) Shine (.6) 



firmation 



Similarly 



These twb ca 



Prediction x 



Shine" 



j 'jRain" 
(hatched) 







^^^^^ 
mmmmmm 
wmamggm 

if (.4) (.9) = .36« 

WMmmmmm 

m m 





I Original 
f sample 
space 



FIG. 15-1 How posterior probabilities are determined. 
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Using (5-10) again 



Similarly 



P(6i/Xi) = = M = 

p{ 0 2 , Xl) = = ^ = .25 



(15-6) 



(15-7) 



When this new (hatched) sample space has its probabilities blown up in 
this way by using the divisor p(x^), the result is the posterior probability- 
distribution in Table 15-3. This is often written in the more convenient and 
general form 

P(0, x) p(6) p(x/9) 



piO/x) = 



(15-8) 



p(x) p(x) 

To keep the mathematical manipulations in perspective, we repeat the 
physical interpretation for emphasis. Before the evidence (barometer) is seen, 
the prior probabilities p{6) give the proper betting odds on the weather. But 
after the evidence is in we can do better; the posterior probabilities p(6/x) 
now give the proper betting odds. (This may be intuitively grasped by 
appealing to the relative frequency interpretation. Of all the times the 
barometer registers "rain/ 1 in what proportion will rain actually occur? 
The answer is 75%.) As a simple summary, we note that the prior probability 
distribution is adjusted by the empirical evidence to yield the posterior 
distribution. Schematically: 



Prior 




Probability of 




and 


probabilities 




empirical evidence 


P(0) 




p(xjO) 



yields 



Posterior 
probabilities 

p(Olx) 



(15-9) 



PROBLEMS 

15-1 A factory has 3 machines (8 1 , d 2 , and 0 3 ) making bolts. The newer the 
machine, the larger and more accurate it is, according to the following 
table: 



Machine 


0 2 (oldest) 


e 2 


0 3 (newest) 


Proportion of total 








output produced 








by this machine 


10% 


40% 


50% 


Rate of defective 








bolts it produces 


5% 


2% 


1% 
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Thus, foJ example, 0 S produces half of the factory's output, and of all 
the bolts b 3 produces, 1% are defective. 

(a) Suppbse a bolt is selected at random; before it is examined, what 
is thl chance it was produced by machine By 0 2 ? By 0,? 

(b) Supdose the bolt is examined and found defective; a/ter this 
examination, what is the chance it was produced by machine 6,1 By 
0,?By4? 

15-2 Suppose a man is drawn at random from a roomful of ten people, 
whoj>e heights 6 have the following distribution: 

0 (inches) p(0) 



70 


.1 


71 


.3 


72 


.2 


73 


.2 


74 


.1 


75 


.1 



(a) Graph this (prior) distribution of 0. 

(b) Supbose also that a crude measuring device is available, that makes 
ernjrs vsfith the following distribution: 



Surely 
hei 



15-2 
(a) 



Suppos 
afternoons it 



e (error in inches) 


p(e) 


-2 


.1 


-1 


.2 


0 


.4 


1 


.2 


2 


.1 



fely this can help us to be more accurate in estimating the man s 
Lht 'For example, suppose his measured height using this crude 
de|ice ^s x = 74 inches. We now have further information about 6; 
this measurement changes the probabilities for 6 from the prior 
•• u ,i 0 n p(0) to a posterior distribution p{0jx = 74). Calculate and 
1 his posterior distribution. 



i.e. 

distribution 

gK.ph 



OPT 
Example 
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a salesman regularly sells umbrellas or lemonade on Saturday 
football games. To keep matters simple, suppose he has just 
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three possible options (actions, a t ): 

a x = sell only umbrellas; 

a 2 = sell some umbrellas, some lemonade; 

a 3 = sell only lemonade. 

If he chooses a x and it rains, his profit is $20; but if it shines, he loses 310. 
It will be more convenient to describe everything as a loss (negative profit); 
thus his losses will be —20 or +10 respectively. 

If he chooses action a. z or a 3 , there will also be certain losses. All this 
information may be assembled conveniently in the following loss table: 



Table 15-4 Loss Function /(a, 6) 



State 6 

Action 


Rain (0 X ) 


Shine (6> 2 ) 


a L 


-20 


10 


a> 


5 


5 


*3 


25 


-7 



Suppose further that the probability distribution (long-run relative frequency) 
of the weather is as shown in Table 15-5. 



Table 15-5 Probability Distribution 



of 6 


State 0 


Rain 


Shine 


Probability p(0) 


.20 


.80 



What is the best action for the salesman to take? (You are urged to work 
this out, before reading on; it will be easier that way.) 

Solution. If he chooses a x , what could he expect his loss to be, on the 
average? Intuitively, we calculate the expected loss if he chooses a x : 



L(a x ) = -20(.20) + 10(.80) = 4 (15-10) 



Similarly 

and 
In general 
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We recognizd this as the concept of expected value, 1 as given by (4-17): 

Lfa) = /( fll , e,) Kfli) + /(*!, e a ) p(e 2 ) = 2 e ) K6) (15-n) 

, we' evaluate 

L(a 2 ) = 5(.20) + 5(.80) = 5 (15-12) 



L(a 3 ) = 25(.20) - 7(.80) = -.6 
L(a) = 2 K<*> 0) p(6) 



(15-13) 
(15-14) 



The optional action is seen to be a a , which minimizes the expected loss; 
in fact, tljiis isjthe only option that allows any expected profit. To summarize, 
we assemble all our information and calculations in Table 15-6: 



Table 15-6 Calculation of the Optimal Action a 




(b) Generalization 



.20 



.80 



-20 
5 
25 



10 
5 
-7 



L(a) = expected loss 



4 

5 

— .6 * - minimum 



Loss function l(a, 0) 



It hajrdly seems necessary to state that this problem can be generalized 
to any number of states d or actions a (even an infinite number, as in the 
next section). The objective remains the same: to minimize expected loss. 
We now paus| to consider: 

! | 

1. The probabilities p(0), and 

2. Tfje lop function l(a, 6). 



1 For those who j wish to review, we give an alternative intuitive calculation. In, say, 100 
days he would get about 20 rainy days at S — 20 each, yielding S— 400; and about 80 shiny 
days, at S-fjlOeajCh, yielding S + 800 — for a sum of about S+400 in 100 days, or an average 
of $4 per day. 
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Probability of data 
p(xfd) 



Prior distribution 




Expected loss L(a); 
find the smallest 



FIG. 15-2 The logic of Bayesian decisions to minimize expected loss. 



The probabilities p(6). These of course should represent the best 
possible intelligence on the subject. For example, suppose the salesman 
moves to another state, with weather probabilities as given in Table 15-1. 
If he has no barometer, he will have to use the (prior) probabilities in this 
table. But if he can consult the barometer (described in Table 15-2), then of 
course the posterior probabilities p(6jx) in Table 15-3 should be used. (See 
Problem 15-3.) 

The logic of Bayesian inference is laid out in the block diagram, Figure 
15-2. Incidentally, in the calculation of the average loss L(a) in (15-14) it 
would not hurt to use kp(6) instead of p(6) as weights, where k is any constant 
(independent of 0 and a). For kp(d) would generate losses kL(a), which 
would rank in the same order as the true losses L(a), and hence point to the 
same correct optimizing action. This is a very useful observation. Thus, for 
example, our umbrella salesman need not undertake the last step in cal- 
culating the posterior probabilities of rain pid/x^ in (15-6) and (15-7); he 
can forget about the denominator pixj, and use (15-2) and (15-4) instead — 
without affecting his decision. 2 

The loss function, l(a, d). In our example, we assumed that monetary 
loss is the appropriate consideration. This may be valid enough if the decision 
is made ("game is played 1 ') over and over again: whatever minimizes the 
expected loss in each game will minimize total expected loss in the long run. 

2 i.e., attaching weights of .36 and .12 to his losses would yield the same result as weights 
of .75 and .25. 



Yet 
monetary 
were 



then 



Most pep 
value 
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j are some decisions that are made only once, and then expected 
loss may not be the right criterion. To illustrate: suppose you 
offered (tax-free) a choice between 

(£.) 3100,000 for sure, or 

(b) a 1/2 chance (lottery ticket) on a $210,000 prize, 
•pie would prefer choice (a), even though its expected monetary 



reason 



The 

than the 
first hundred 
opportun ities 
the sports 
be based 
of money 
one auth 
ate 

expected 
are: 



Or s si 
measiire 



which is a 
function 
hereafter 



3 This utility 
dividual by 
used to define uti 



(15-15) 



S100.000 (1) = S100,000 

is less than that of choice (b): 

S210.000 (1/2) = 8105,000 



(15-16) 



is.that most people value the first hundred thousand much more 
secojid. (The student should speculate on how he would spend the 
^thousand. Once these purchases have been made, less exciting 
jwould be available for spending the second hundred thousand; 
car. has already been bought, and so on.) Such a decision should 
Jot (>n money itself as in (15-16), but rather on a subjective valuation 
or the "utility"' of money. As an illustration, Figure 15-3 shows 
objective evaluation 3 U(M). Since utility is the more appropri- 
ate decision should be based on expected utility, rather than 
fronfey. Using Figure 15-3, the expected utilities of the two choices 



(a) Wl (l) = Ul 

(b) u 2 (i) = 1.4 Ml (i) = J Ml 



05-17) 



clea|r victory for choice (a). In decision situations, a loss-of-utility 
of this kind should typically be used as our loss function l(a, 0); 
jve shall interpret losses in this way. 



U(M) 







ui 


U2 = 1.4«i 



210,000 
M (dollars) 

FiG. 15-3 Author's subjective evaluation of money. 



curve is highly personal, and temporary. It is defined empirically for an in- 
^skinb him which bets he prefers. In other words, many bets like (15-15) are 
ity, rather than vice-versa. 
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PROBLEMS 

15-3 Using the losses of Table 15-4, calculate the optimal action if 

(a) The only available probabilities are the prior probabilities of Table 
15-1. 

(b) The barometer reads "rain" (so that the posterior probabilities of 
Table 15-3 are relevant). 

(c) The barometer reads "shine." 

(d) Is the following a true or false summary of questions (a) to (c) 
above? If false correct it. 

If the salesman must choose his action (order his merchandise) 
before consulting a barometer, then a 2 (umbrellas and lemonade) is 
best. However, if the barometer can be consulted first, then the salesman 
should 

Choose a x (umbrellas) if the barometer predicts "rain/' 
Choose a z (lemonade) if the barometer predicts "shine. " 
But a bright salesman could have seen this obvious solution without 
going to all the trouble of learning about Bayesian decisions. 

15-4 A farmer has to decide whether to sell his corn for use A or use B. 
His losses depend on its water content, (determined by the mill during 
processing, after the farmer's decision has been made) according to 
the following table. 



(a) If his only additional information is that, through long past 
experience his corn has been classified as dry one third of the time, 
what should his decision be? 

(b) Suppose he has developed a rough-and-ready means of determining 
whether it is wet or dry — a method which is correct 3/4 of the time 
regardless of the state of nature. If this indicates that his corn is "dry" 
what should his decision be? How much is this method worth, i.e., how 
much does it reduce his expected loss? 




Use A 
Use B 



-10 

20 



30 
10 



=> 15-5 A 
road. 



sch 



Let 
Thu 
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ool is to be built to serve 125 students, who live along a single 



(1) 
(2) 



distance student / lives from origin 



a = distance of school from origin 

Ov — a) = distance of student / from school. 



(a) Whep is the optimum place (mean, median, mode, midrange?) to 
build th^ school in order to 
' n Minimize the distance that the farthest student has to walk. 

Minimize the total walking done, i.e., minimize the sum of the 

abs n 1 ,, * <= ' rl*»\/i*Q*ir»r»e • 



alute deviations: 



2 to - <*\ 

*(3) Minimize the sum of the squared deviations: 

2 (*, - ay 

(H/ftL Calculus suggests differentiating with respect to a, setting the 
result e <j}ual to zero.) 

(4) Maximize the number of students who live where the school is 
built, and do not have to walk at all. 
(b) Does! the following accurately reflect your conclusions in ques- 
tion (a) above? If not, correct it. 



only 
walk 



n (2) we are concerned only about the total walking done; walking 
is considered a loss, no matter who does it. In (1), on the other hand, 
the Walking done by the two extreme people is considered a 5 loss; 
ng done by any others is of no concern whatsoever. (3) is a corn- 
promise; we imply that although all walking is some kind of loss, the 
more a student has walked, the greater his loss in walking one more 
mile. ThiJs the person who walks 3 miles (z t - a = 3) contributes 9 to 
the loss Junction, whereas the person who walks 1 mile contributes 
only 1 
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15-3 ESTIMATION AS A DECISION 

In our earlier example the states 6 (rain and shine) and actions a were 
categorical (i.e., nonnumerical). But this was not an essential part of the 
theory; in this section we consider a numerical example. 



Example 

Suppose the judge at a beauty contest is asked to guess the height 0 
of the first contestant, whom he has never seen. Yet he is not in complete 
ignorance; suppose he knows that the heights of contestants follow the 
probability distribution p{6) shown in Figure 15-4. 



0 (inches) 


P(6) 


64 


.1 


65 


.1 


66 


.2 


67 


.2 


68 


.3 


69 


.1 



P(8) 



11 



64 



66 



68 



FIG. 15-4 Prior distribution of heights 0. 



(i) Suppose, in order to encourage an intelligent guess, the judge is to 
be fined SI if he makes a mistake (no matter how large or small); "a miss is 
as good as a mile." What should the rational judge guess? 

(ii) Suppose the rules become more severe, by fining the judge S:r for 
an error of x inches; the greater his error, the greater his loss. What is his 
rational guess? 

(iii) Suppose the rules are made even more severe, by fining the judge 
Sx 2 for an error of x inches; this is the same as (b), except that the loss 
becomes more severe as his error increase. What is his rational guess now? 

Solution, (i) The most likely (modal) value 68. 

(ii) The median value 67. 

(iii) The mean value 66,8. 

Thus (i), (ii), and (iii) are like (4), (2), and (3) in the schoolhouse Problem 
15-5, with the same solution. 

To translate this into the familiar language of decision theory, the girl's 
height is the state of nature 0, and the guessed height (estimate) is the action 
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are 



The fine the judge must pay is the loss function l(a, 6); since 
_ numerical, the loss function is most conveniently given by a 
rather than a table. Each of the 3 loss functions, along with its 



a to be t^ker 
a and 0 
formula 

corresponding optimal estimator, is shown in Table 15-7. 



Table 15-7 How the Optimal Estimator of 0 Depends on the 
Loss Function 



If the Loss Function l(a, 0) is: 



(i) Oiftf = 6 exactly, 
1 otherwise 

("the 0-1 loss function") 

(ii) \a - 6\ 

(iii) (a - Of 



Then the Corresponding 
Optimal Estimator a is: 



Mode of />(0) 



Median 
Mean 



The "quadratic'* loss function (iii) is the one that is usually used in 
decision theory. It is graphed in Figure 15-5. It is justified not only by its 
intuitive appeal, but also by its attractive mathematical properties. For 
example, it Is easily differentiated (an important requirement in minimization 
problems); on the other hand (i) obviously cannot be differentiated, nor can 
(ii), sinjce it: is an absolute value function. 

We reemphasize that the probability distribution p(d) used in the decision 
process; ought to reflect the best available information. Thus we may be 
forced jto usje the prior distribution p(0) if we have not yet collected any data, 
but aftpr d<|ta is collected, the posterior distribution p(d/x) is appropriate. 




FIG. 15-5 The quadratic loss function. 
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PROBLEMS 



These are extensions of Problem 15-2. 

15-6 Suppose you have to guess the height of the man drawn in Problem 15-2, 
with only the prior distribution p{8) known. Find the optimal estimate 
offl 

(a) Assuming I(a, 8) = 0 if a = 0 

= 1 otherwise, (15-18) 

(b) Assuming l(a, 6) = \a - 0|. (15-19) 

(c) Assuming /(a, 8) = (a - 8f. (15-20) 

15-7 Repeat Problem 15-6, after the man's height has been crudely measured 
as x = 74, so that the posterior distribution p{8jx) is relevant. 



15-4 ESTIMATION: BAYESIAN VERSUS CLASSICAL 

This comparison is best shown with an extended example, illustrated in 
Figure 15-6; from this we shall draw conclusions later. 



(a) Example 

Suppose it is essential to estimate the length 8 of a beetle accidentally 
caught in a delicate piece of machinery. A measurement x is possible, using 
a device which is subject to some error; suppose x is normally distributed 
about the true value 8, with a = I. Suppose x turns out to be 20 mm. 

Question (a). What is the classical 95% confidence interval for (9 ? 

Solution. Our information on the sampling distribution of x 9 i.e., 

p(xjd) = N(8, a 2 ), specifically N(8, 1) (15-21) 
can be "turned around" to construct the following confidence interval for 6: 

0 = 20± 1.96(1) 
= 20 ± 1.96 (15-22) 

and, of course: 

point estimate of 8 = 20 (15-23) 

Question (b). Suppose we take the effort to find out from a biologist 
that the population of all beetles has a normally-distributed length, with 
mean 8 0 — 25 mm and variance o\ = 4. How can this be used to define a 
posterior distribution of 81 

Solution, It will be useful to develop a general formula applying for 
any 0 O , a 0 , etc., and then solve it for our specific example. Since our prior 



distribut on i& 
and the distr 



(15-21) 

repeated 

ii ■ 

it can be| shown 4 that the posterior distribution is also normal; specifically: 

p(Ojx) = N{ab,a) ( 15-36) 



critical to 



Let 
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p(6) = N(6 0 , al) 
bution of our empirical evidence x is: 
p(xlB) = N(6, a 2 ) 
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(15-24) 



the a 



4 (15-24) a|nd (15-21) may be written: 

p(x/0) = K 2 e~ ~<il*° 2 )lx~e)* 
where K x and a 2 and other similar constants introduced in this footnote are of a form not 



(15-25) 
(15-26) 



rgument. Since 



p(x i e)=p(0)p(xjO) 
we can us]e (15*25) and (15-26) to write 

p ^ 0) = ^ i ^ 2 ^-fl/2)[(l/ CT 2)(02-200 0 +eS)+U/o2)(^ 2 ^ 4 ^ )] 

Now consider only the exponent, which may be rearranged to 



Finally we use 
and 



1 l 

4 



al ^ a 2 a 



Using thdse definitions, the exponent (15-29) can be written 



2a 



[6>* - 2abS + tf 4 ] 



(integrating to 



this to write (15-28) as 

p( x , 0) ^ tf fi e-<l/2 ff >(e-a&> 2 

6) _ K „ {lj2a)i0 _ ab) 2 



p{?) 



(15-27) 
(15-28) 

(15-29) 

(15-30) 
(15-31) 

(15-32) 
(15-33) 

(15-34) 
(15-35) 



This means th it 0, given is a normal variable with mean ab and variances, provided 
a appears appropriately in AT 7 . But it must, since p(Bjx) is a bona fide probability function 



! 1), and K 7 is just the scale factor necessary to ensure this. 
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where 



1 = 1 2 + i ; (15-37) 
a (To tr 

+ ^ (15-38) 



Now apply this to our example. Since 

oJ-4 
a 2 = 1 
^o = 25 
^ - 20 



it follows that 



and 



a " 4 14 



ft ^ 25 20 _ 1_05 
~ 4 1 ~ 4 



Thus: 



mean = ab = 21.0 
variance = a = .8 
Hence the posterior distribution may be formally written 

^(0/^ = 20) = #(21,. 8) (15-39) 

compared with the prior: 

/>(0) = #(25,4) (15-40) 

The Bayesian logic is shown in Figure 15-6. A prior distribution is 
adjusted to take account of observed data (z), with the weight attached to the 
observed x depending on its probability p(x/d). The result is the posterior 
distribution, with mean (21) falling, as expected, between the prior mean (25) 
and the observed value (20). (As a bonus, variance is reduced in the posterior 
distribution. Although this does not always happen, it is evident that it must 
happen for normal distributions ; for (15-37) shows that the posterior variance 
a is less than org, and also less than a 2 incidentally.) 

Question (c). With the posterior distribution (15-39) now in hand, 
defining a Bayesian estimate of 6 requires only a loss function. Suppose this 
is the quadratic loss function; what is the Bayesian point estimator of 6? 
Find also the 95% probability interval for 0. 
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- Bayesian 95% probability interval - 
21 + 1.96V! 




[Posterior I 

p«?/ar = 20) = JV(21 I .8) 



j Priori 

p(d)«JV{%o- 0 2 ) = JV(25 f 4) 



25 



-« — Class leal 95% confidence interval • 
± 1.9^ cr = 20 + 1.96 

! 1 



Based only on [thejyi dence x, and its distribution: | 

,>{xi$y i =N(e f o 2 ) = me, i) 

FIG 15-6 Bayesian versus classical estimation. 

Solution For the quadratic loss function, the posterior mean (21) is 

tZT2tf tim&t T (N T te that becau * /»(»/*) is normal, this is also the 
po ter or median and mode, so that all the loss functions in Table 15-7 
yield the samfe answer. This is reassuring, and frequently happens in practice ) 
To conduct a 95 % probability interval, we know from (15-39) that 
given tte observation « = 20, there is a 95 % probability that 0 will fall in the 

21 ± 1. 96V J 
= 21 ± 1.76 

is narrower (more precise) than the classical interval (15-22) 
^alue of the prior information p{6). 



Note that thi 
reflecting the 
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15-8 As 0 

(<x ; 



ur tjieans of measuring (beetles) becomes more and more precise 
* 0), show that in the posterior distribution p(6/x), 

the mean x 



variance 



(15-41) 
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In other words, if we use an errorless measuring device, we can be 
certain that the true 0 will be its measured value x. 
=> 15-9 Using Figure 15-6, what would you expect intuitively of the posterior 
mean if two independent measurements of the beetle had yielded an 
average of 20 mm? (For an extension of your answer, see the section 
immediately following.) 



(b) Generalization 

Suppose that a sample of n independent measurements x l9 x z , . . . , x n 
can be taken rather than just a single x. Using the sample mean x, what now 
is the Bayesian estimate of 6 ? In particular, what happens as we get more and 
more observations (n — > oo)? 

This problem may be solved, using (15-36) to (15-38) with one important 
change. Since our data now is x instead of x we must make this substitution 
in (15-38), and also substitute 

(15-42) 

n 

for a 2 in (15-37) and (15-38). [Of course, (15-42) is just the variance of a 
sample mean when a 2 is the variance of a single observation.] Thus, our 
generalized definition of a and b in (15-36) is: 

i = \ + for n = 1, this reduces to (15-37) (15-43) 

a <7o o- 

b = % + ^ for n = 1, this reduces to (15-38) (15-44) 



In the limit, as sample size n ~* oo: 



1 n 



(15-45) 
(15-46) 



nx 
b~ — 

a 2 

Incidentally, exactly these same results follow, whether n — * oo, or 

oj—oo (15-47) 

Thus, evaluating (15-36): 

posterior mean = ab cz. x (15-48) 

posterior variance = a ^ — (15-49) 

n 

Again the normality of this posterior distribution ensures that its mean, 
mode, and median coincide. Hence, regardless of which loss function we may 
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use: : - 

Bayesian estimator of 0 x (15-50) 

95 % probability interval ~ x ± 1 .96 (15-51) 

We conclude that, as n -* oo, Bayesian estimation approaches the classical. 
This is eixact 



Table 15 



y as it should be: as more and more data are collected, less and 



■8 delation of Classical and Bayesian Estimation. (Although Normality 
ij Assumed, Results are Instructive for Other Cases Too) 



Procedure 
to 

Estimate 



And Gets the Answer: 



Point and 
Interval 
Estimates 



Requires, 
Along With 
Observed x 



Tn Our 
Example 
(« = 1) 



In the Limit, as 

/ ► co or c? ► co 



Classical 



Point estimate 

j Confidence 
: interval 



pix/6) 



20 

20 ± 1.96 



x ± 1.96 — 



Bayesian 



Point estimate pQc/O), p(6) 

and loss function 
! Probability p(xie\p(6) 
I interval 



21 



Same as classical 



21 ± 1.76 Same as classical 



less weight need be attached to prior information; and with an unlimited 
sample, priot information is completely disregarded, as in classical estimation. 

and Bayesian approaches are compared in more detail in Table 



The cla$sica 
15-8. 

We 
estimators a 
if ex* -> 
the less 



now turn to the other condition that leads to the same result. Bayes 
so approach the classical if prior information is very vague (i.e., 
oo, is stated in (15-47). Thus the less the prior distribution tells us, 
weight we attach to it. To sum up, the two reasons for completely 
disregarding prior information are (1) if present data is in unlimited supply, 
or (2) if prior information is useless. 



(c) Is 6 Fixed or Variable? 

In tjhis chapter we regard the target to be estimated as a random variable— 
for example the beetle's length 0 in Figure 1 5-6. Yet in all preceding chapters', 
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we have regarded the target as a fixed parameter — for example, the average 
height fi of American men. Nevertheless, we may often find it useful to think 
of n as having a subjective probability distribution — with this being a descrip- 
tion of the betting odds we would give that /u is bracketed by any two given 
values (see the description of subjective probability in Section 3-6). In the 
problem of men's heights it may be helpful to boil down our best prior 
knowledge of t u into a prior subjective distribution of ju. Then the posterior 
subjective distribution of ju would reflect how the sampling data changed the 
betting odds. 



PROBLEMS 

15-10 Following the beetle example in Section 15-4(b), suppose that: 

a\ = 100 
flo = 25 
a 2 = 1 

and a sample of 4 independent observations on the trapped beetle 
yields an average length x of 20 mm. 

(a) Calculate the Bayesian point estimate for 0, the length of the beetle. 
For two reasons this estimate is closer to the observed value of 20 than 
the Bayesian estimate (21) in Figure 15-6. Explain. 

(b) Calculate the Bayesian 95% probability interval for 8. 

=> 15-11 Suppose that, in a random sample of 10 students on an American 
college campus, you find only one is a Democrat. Which would you 
rather quote as your "best estimate" of the proportion n of Democrats 
in the population (whole campus): 
(a) The classical estimate, 



or 

(b) The Bayesian estimate which, assuming this subjective prior dis- 
tribution : 
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aihd a quadratic loss function, yields 5 the estimate 

x + 3 = A = 25 
n + 6 16 

. ■ ■ . • : S 

15-5 CfaTtQUE OF BAYESIAN METHODS 
(a) Strength 

Bajesiai inference is the optimal statistical method (in the sense of 
minimizing loss of utility) if there is a known prior distribution p(6) and loss 
function l(a\ 0). Compared to classical methods, Bayesian methods often 
yield srjortej* interval estimates (e.g. , Table 15-8), more credible point 
estimate's (e.g., Problem 15-11), and more appropriate hypothesis tests (e.g., 
Problem 15-13 below). Bayesian methods are particularly useful in the social 
sciences! and business, where sample size is often very small, and Bayesian 
method i differ considerably from the classical methods. 



(b) Wea kness 

The ma or criticism of Bayesian estimation is that it is highly subjective. 
The prior p{6) and loss function /(<?, 0) are usually not known 6 — nor is there 
often any hope at all of specifying them exactly. For example, what is the 
loss function for an economist measuring a population's unemployment rate, 
with inevitable statistical error? We have already seen that this is. not as 
serious a dificulty as it seems at first glance, since in many problems any of 
the three lop functions of Table 15-7 lead to the same Bayes estimator. 
Then selecting the "wrong" loss function would still lead to the right 
estimator. 

The other information required — the prior distribution p(6)— usually 
remains unknown too. Moreover, there are often difficulties in interpreting 
6 as a random variable; an economist cannot regard the unemployment rate 
6 as a random variable (as though it is drawn from a bowlful of. chips). 
Instead he must think of p(0) as a subjective distribution reflecting his prior 
betting odds on 0. But he may not view even this as entirely satisfactory. 

Since Bayesian techniques require a rough-and-ready specification of 
these unkno 



wn functions, they do indeed involve subjective judgments. The 



5 For proof, see for example, Lindgren, B. W., Statistical Theory, 2nd Ed. New York: 
Macmillan, 1967. 

6 The other recuired information for Bayesian inference is pixjO), the distribution of sample 
data x. But th s can often be borrowed from classical statistics. [For example, recall how 
we borrowed 4 classical deduction in (15-42).] 
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interesting observation however, is that classical methods which require no 
such explicit specifications, are by no means free of the same subjective 
elements. One of the major contributions of the Bayesian method has been 
to lay bare the assumptions implicit in classical techniques. As we shall see 
in the next section, some of these fare badly when exposed; in extreme cases 
any intelligent guess is substantially better. 7 



*(c) Classical Methods as Bayesian Methods in Disguise 

Suppose a Bayesian wishes to estimate 8 with no prior knowledge. In 
desperation he might use the "equiprobable" prior: 

p(0) = c, a constant (15-52) 

Further suppose that, rather than using the familiar and attractive quadratic 
loss function, he opts for the 0-i loss function. He thus will estimate 0 with 
the mode of the posterior distribution: 

p{ejx)= Pmm (15-53) 
But because of (15-52): ^ (15-8) repeated 



p(xld) (15-54) 



c 

To find the mode, he finds the value of 8 which makes p(8jx) largest. But 
since the bracketed term [cjp(x)] doesn't depend on 0, he only needs to find: 

the value of 8 which makes p(x/8) largest (15-55) 

But this statement is recognized as just the definition of the classical MLE. 8 
From this, we conclude that a classical statistician who uses MLE is 
getting the same result as a Bayesian using the 0-1 loss function and an 
"equiprobable" prior. This seems a very unflattering description of MLE, 
since neither this prior nor this loss function is easy to justify. But in many 
cases, MLE is not nearly this restrictive. Up(8/x) is unimodai and symmetric, 
as it often is, then its mean, mode and median coincide; in such circumstances 
MLE is equivalent to Bayesian estimation using any of our three loss 
functions. 

As if the discussion of MLE above has not been damaging enough, we 
consider an even more questionable application. Suppose we are estimating 

7 A further criticism of Bayesian methods is that there is too great a cost of computing 
Bayesian estimates (not to mention learning about them); but this criticism is being 
weakened with the advent of better computer programs. 

8 Note that in developing MLE in Section 7-3, the notation p(.v; 6) was used, equivalent 
to pU jQ) used here. 



HYPOTHESIS TESTING AS A BAYESIAN DECISION 



333 



a population proportion tt (as in Problem 15-11). It has been proved 9 that 
a class :cal statistician using MLE will arrive at the same result (estimating 
■n with^/n) as a Bayesian using the quadratic loss function and the prior 
distribution shown in Figure 15-7. 

Tlu's pj-ior distribution is obviously hopeless, the worst we have yet 
encountered. (It means that a huge majority of students are Republican, or 
a hugej majority are Democratic.) We recall that we may have been un- 
comfortable about the prior distribution graphed in Problem 15-11; but it 



p(ir) 




FIG. 15-7 



was vastly better than this. This explains why MLE can occasionally give a 
very strange result in a small sample; our intuition was correct in leading us 
to reject it in Problem 15-11. 

In poncjlusion, although MLE has many attractive characteristics [see 
Section j7-3(f)], these are large sample properties; in small sample estimation, 
it should be used with great caution. 



*15-6 
(a) Example 
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Surjposj there are two species of beetle. Species S 0 is harmless, while 
species $ x isj a serious pest, requiring an expensive insecticide. A beetle is 
sighted in a - new , as yet uninfested territory; but this sighting provides no 



9 Again, !>ee for example, Lindgren, B. W., Statistical Theory, 2nd Ed., New York- 
Macmillan 1967. 
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information useful in establishing whether the beetle was 5 0 or Should 
insecticide be used or not? 

To answer this question, we need to know the costs l(a, 6) of a wrong 
decision, and the probabilities p(6) of it being one species or the other; 
these are given in Table 15-9. Obviously action a 0 (don't spray) is appropriate 
if the state of nature is S Q (harmless beetle) while a 1 is appropriate if the state 
is S v 

Question (a). Should we spray, or not? 

Solution, It will be convenient to generalize the loss table, calling l(a i9 dj) = 
l ti9 for short. As always, we calculate the expected losses L(a), by weighting 



Table 15-9 Probabilities of States of Nature, and 
Loss Function 



p(0) 


.7 


.3 


\. State 6 


So 


Si 




(Harmless 


(Harmful 


Action a 


Species) 


Species) 


a Q (don't spray) 


5 


100 


a x (spray) 


15 


15 



elements in each row of this table by their appropriate probabilities: 

L{a,)^p{6,)k, + p(e i )h 1 

= (.7)5 + (.3)100 = 33.5 

and 

L( ai ) = (.7)15 + (.3)15 = 15^min 

Thus the optimal action is a x : spray. 

We see that this problem may be expressed in terms of hypothesis 
testing: action a Q (don't spray) may be interpreted as accepting H 0 (harmless 
beetle), while action a x (spray) may be interpreted as accepting H x (harmful 
beetle). 

Question (b). Suppose that prior information about the beetles is that 
species S 0 is 9 times as common as S v Given this new information about 
p(d), what is the optimum action? 

Solution. Don't spray, as shown in Table 15 -10. 

In this case the harmful species is so rare, that it is better to **take the 
risk," i.e., assume the beetle is harmless as our working hypothesis. 



substitu 
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a 


priori 






p(0) 


.9 


.1 




^^\State 0 
Action a 


So 


Si 


L(a) 


(Don't spray) 

*i 
(Spray) 


5 

15 


100 
15 


14.5 *- min 
15 



Question (c). So far we have assumed no statistical information on the 
beetle that lias been sighted. Now suppose it has been captured, with its 
length measured as 27 mm. Suppose further that the two species are dis- 
tinguishable by their lengths, which are normal random variables with = 4, 
: ns 4 = 25 and B x = 30 respectively. What now is the best action, 
a posteriori? [Assume p(6) and losses given in Table 15-9.] 

Solution. ! It will be most instructive to develop a general solution, leaving 
substitution !of particulars to the end. Losses are calculated as in (15-56), 
ing the appropriate posterior probabilities p(Bjx) for p(0): 

(15-57) 



Similarly 



L (a 0 ) = p(6 0 lx)l 00 + p(6Jx)Ito 



(15-58) 

We choose ^ction a Q if and only if 

L(a Q ) < L(a ± ) (15-59) 

Substituting (15-57) and (15-58) into (15-59), and collecting like terms, we 
obtain the criterion: choose a 0 iff 

piejx)^- i n ] <p(ojx)[i w - / 00 ] 

Th^ bracketed quantities 



and 



d ri 
e^tra 



are call 
(r 0 ) is the 
rather than 
the diffsrenfce 



r o — ho ~ ho 



; (15~60) 
(15-61) 

: i! 

kir- hi (15-62) 

•egrets. It is easy to see why: the regret if the beetle is harmless 
loss incurred if we used the wrong action— i.e., sprayed (a^, 
lot sprayed (a 0 ). Evaluating (1 5-61) we see that r 0 is 15 — 5 = 10, 
in column elements in Table (15-10). Our much larger regret 
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r x ~ 100 — 15 represents our net loss if we employ the wrong action (don't 
spray) on a beetle that turns out to be'harmful. 

Returning to (15-60), it may now be written in terms of regrets: 



i.e. 



p(ft x \x)r x < p(0Jx) r 0 
p(0Jx) r x 



(15-63) 
(15-64) 



The posterior probabilities in this equation can now be expressed in 
full using (15-8), and noting that p{x) cancels, 



P(0x) P(xlQi) < r» 
P(0o) P(*l0o) r x 



(15-65) 



Recall that this is our criterion for action a 0 (don't spray), interpreted as 
acceptance of H Q : (beetle harmless, 6 = 0 O ). An appropriate cross-multipli- 
cation of (15-65) leads us to an important theorem, called the 



Bayesian Likelihood-Ratio Criterion: 
Accept H 0 iff 
gW r Q p(d 0 ) 
p(x/e 0 ) r lP (Bd 



(15-66) 



where t\ is the regret if d i is true, p{6^) is the prior distribution, and pixjd^ 
is the distribution of the observed data. 

As stated earlier, p{xj8 % ) is often borrowed directly from classical deduc- 
tion, and is the distribution of the estimator x, given the parameter 8^ 
Specifically, it appeared in maximum likelihood estimation in Section 7-3 as 
the likelihood function. Thus the left-hand side of (15-66) is called the 
"likelihood ratio." 

This criterion is certainly reasonable. If 0 X is a sufficiently implausible 
explanation of the data [i.e., pixjB^ is sufficiently less than p(x/0 0 )] 9 then the 
likelihood ratio will be small enough to satisfy this inequality. Thus H 0 will 
be accepted, as it should be. 

To illustrate further, consider the very simple case in which the regrets 
(penalties for error) are assumed equal, and the prior probabilities p(6 0 ) 
and p(d x ) are also assumed equal. The right-hand side of (15-66) becomes 1 ; 
thus H 0 is accepted if the likelihood of 6 0 generating the sample [p(^l6 0 )] is 
greater than the likelihood of 0 X generating the sample [p(xj6<$\. Otherwise, 
the alternative H 1 is accepted. In simplest terms: we select the hypothesis 
which is more likely to generate the observed x. In this sense, this could be 
viewed as hypothesis testing, within a maximum likelihood context, shown 
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in Figure 15- la. In b we make the further assumption that the two likelihood 
function^ (centered on 6 Q and 6 X respectively) have the same normal 10 
distribution. Then criterion (15-66) reduces to 



Accept H 0 iff x is observed closer to 6 0 than d x 



(15-67) 



Again, a very reasonable result. 

Evaluating (15-66) when r 0 ^ r x or p(0 0 ) ^ p{Q x ) is obviously a more 
complicated matter. To keep things simple, we assume that 0 o < 0 l9 and that 
p{xfO Q ) and p{xf& x ) are normal with a common a. Then (15-66) — our criterion 



FIG. 15-3 



10 In fact 
symmet: 



p(x/$o) 




Accept Hq 
if x observed in 
this range 



Accept H\ 
(a) 



p(x/e 0 ) 



p(xfBi) 




Critical value 



Accept Hq i Accept Hi 



I Hypothesis testing, using the Bayesian likelihood ratio [special case when 
= r/and p(B 2 ) - p(6 x )]. (a) For any (b) If p{xjB^ - N(0 i9 a\ 



nor 



nality is not required; the two distributions need only be unimodal and 
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for accepting H 0 — becomes 

e -a^H X -o^ ( e o) 

e^'"-*' 1 < (l5 " 68) 
This may be reduced 11 to : accept H 0 iff 



x < log 

6 t — e 0 



+ ^-±1° (15-69) 
2 



(The logarithms used throughout this section are natural logarithms, to the 
base e. The common logarithms of appendix Table VIII can be converted 
to natural logarithms by multiplying by 2.30.) We note that the right-hand 
side of (15-69) is independent of x\ as in all hypothesis tests, this can be 
evaluated prior to observing x. At the same time it does depend, as expected, 
on background information p(0) and regrets. Moreover, when r 0 = r x and 
p(0 0 ) = /?(0 X ), then the log term disappears and this reduces to the special 
case (15-67). 

Finally, the particular problem of the beetle spray can now be solved. 
Substituting the information given in question (c) and Table 15-9 into 
(15-69) yields: accept H 0 iff 

x<±log l^j + 27.5 (15-75) 



5 L85(.3)J 



< 3.2 log + .27,5 (15-76) 



< 3.2( — 1.29) + 27.5 (15-77) 

< 23.4 (15-78) 



11 Details: taking logarithms of (15-68): 



~ i (x " 01)2 + i (x ~ °° )2 < K (15 ~ 70) 



where 



Rearranging (15-70): 



(15-71) 



— 2 (20 r * - 26 0 x - 0* + 6 0 2 ) < K (15-72) 
2(9i - 0 0 p - (0/ - 0 O 2 ) < 2ff 2 A- (15-73) 



i.e., accept // 0 iff: 

Using the definition of K in (15-71), (15-74) may be written as (15-69). 



x< ^ K + ^z3L (1 , 74) 
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Since we observed a 27 mm beetle, this condition is violated, and we reject 
H 0 . But what does seem strange is the critical value in (15-78): even if the 
beetle were 25 mm — exactly 0 Os the length we would expect of a harmless 
beetle— we would still spray. With further thought we see that this answer 
is, after all, reasonable. The heavy damage involved if the beetle turns out 
to be harmful induces us to spray to avoid this risk. [From (15-75) we confirm 
that it i& in fact the relative size of the two regrets that explains this result.] 



(b) Comparison with Classical Methods 

The Ba^yesian hypothesis testing described here involves only two 
competing hypotheses H 0 and H x (two states of nature d 0 and 0 A ) 5 one of 
which njiust jbe chosen. This analysis is of limited scope, since hypothesis 
testing pfterj involves a composite H v Thus we have covered only that 
material! paralleling the first section of Chapter 9. In recalling that classical 
test, we jnote^ that it had the advantage of being far simpler; but it was also 
less satisfactory. It used only the probability function p{xjd), while the 
Bayesian method also exploits the prior distribution p(6) and regrets (the 
loss function); we have seen in the last section how important both these 
can be in setting up an appropriate test. Restated, the classical method sets 
a = 5 o;* 1 % — sometimes arbitrarily, sometimes with implicit reference to 
vague considerations of loss and prior belief Bayesians would argue that 
these cor siderations should be explicitly introduced — with all the assumptions 
exposed, and. open to criticism and improvement. 



PROBLEMS 

15-12 U^ingj?(0) and losses given in Table 15-10: 

(a) Reconstruct the hypothesis test of question (c) above. With your 
measurement of 27 mm, what would you do ? Why does our argument 
in the ikst paragraph (spray even if beetle is 25 mm) no longer hold? 

(b) Suppose that species S 0 and S\ were equally frequent. Would that 
alt^r your decision? 

(c) Hojvv frequent would species S 0 have to be in order to alter your 
decision? 

15-13 Suppose a psychiatrist has to classify people as sick or well (hospi- 
talized or not) on the basis of a psychological test. The test scores are 
normal y distributed, with a == 8, and mean 0 o = 100 if they are well 
or 9 X = 120 if they are sick. The losses (regrets) of a wrong classifica- 
tion ari obvious: if a healthy person is hospitalized, resources are 



X i oifmji 
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wasted and the person himself may even be hurt by the treatment. 
Yet the other loss is even worse: if a sick person is not hospitalized, 
he may do damage, conceivably fatal. Suppose this second loss is 
considered roughly five times as serious. From past records it has been 
found that of the people taking the test, 60% are sick and 40% are 
healthy. 

(a) (1) What should be the critical score above which the person is 

classified as sick? Then 

(2) What is a? (Probability of type 1 error). 

(3) What is /?? (Probability of type II error). 

(b) (1) If a classical test is used, arbitrarily setting a = 5%, what then 

will be the critical score ? Then 

(2) What is p. 

(3) By how much has the average loss increased by using this 
less-than-optimal method? 

(c) What would we have to assume the ratio of the two regrets to be 
in order to arrive at a Bayesian test having a = 5%? Do you think 
it is reasonable? 



*15-7 GAME THEORY 

At this point we leave the general argument of this book to consider a 
rather interesting branch of decision theory. Recall that the concept of 
probability was developed in Chapter 3 as a groundwork for the statistical 
deduction and induction that followed. Game theory is not part of this 
statistical theory; rather, it illustrates a quite different application of the 
concept of probability. 

Game theory is a way of analyzing conflict situations. These may arise, 
for example, in poker, business, or politics; thus our conflicting parties 
might be card players playing for insignificant stakes, oligopolists playing to 
remain in business, or military leaders engaged in a desperate set of moves 
and countermoves. 



(a) Strictly Determined Games 

The players employ strategies. Because a player can choose his strategy, 
he has some control over the outcome of the game. But he is not in complete 
control; the outcome will also depend on the strategy of his opponent. 

The way the outcome of the game is related to the strategy of both 
players is shown in Table 15-13; this is called the "payoff matrix," and defines 
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Table 15-11 An Example of a Payoff 
Matrix for A (Loss Function for B) 

B's Strategies 
1 2 3 



Strategies 



1 


25 


6 


11 


2 


20 


10 


18 


3 


11 


7 


5 



the payoff goihg to player A. Thus if A selects strategy 2 and B strategy 1, 
A receivesj 20. jrhere might also be a payoff matrix for B, similarly dependent 
on the strategies selected by the two players. However, to keep the discussion 
simple, su;ppoie that this is a "zero-sum" game — i.e., what A gains, B loses. 
Thus, Table 15-11 defines not only the gain matrix for A, but also the loss 
matrix (o| los^ function) for B. A should be selecting a strategy to make the 
outcome as large as possible, while B should be trying to keep the outcome 
as small as possible. 

Obviously B will have no interest in playing the game shown in Table 
15-11 since he|can do nothing but lose. So we might think of a payoff matrix 
normally involving some positive elements (where B pays A) and some 
negative ones (where A pays B). Alternatively, in order to induce B to play 
the gamejshown in Table 15-11, A might bribe B S12 for each time he plays. 
This is trie assumption we now make, in order to keep our payoff matrix 
all-positive for easier geometric interpretation. The question is "With this 
$12 side payment, is it in B's interest to play this game?" 

If a player can select his strategy after he knows how his opponent has 
committed himself, his appropriate strategy is obvious and the game becomes 
a trivial one. For example, if it is known that B has chosen strategy 3, A 
will just ;scan| column 3, select the largest payoff (18) and then play that 
strategy %. The essence of game theory, however, is that each player must 
commit jiimself without knowledge of his opponent's decision; he only 
knows the payoff matrix. We further assume that the game is repeated many 
times. Trie orjly clues a player has about his opponent's strategy must come 
from observing his past pattern of play. 

In these jcircumstances A finds the continuous play of strategy 1 un- 
attractive;. It is true that this row has the largest possible payoff ($25). But 
this requires ZTs cooperation in playing his strategy 1, and it is clearly not 
in iTs interest to cooperate. Indeed if B observes that A is continuously 
playing strategy 1 , he will select strategy 2, thus keeping the payoff down to 
6. A fines strategy 3 similarly unattractive; B will counter with strategy 3, 



reducing 



A 9 * 



payoff to only 5. A chooses strategy 2; the very best play by B 
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will still yield A a payoff of 10. Now review why A chose strategy 2. He 
calculated the minimum value in each row — and then selected the largest of 
these minimum values. This maximum of the row m/wima is called the 
"maximin." 

Now consider the problem from B's point of view. Recall that he wants 
to keep the payoff as low as possible. Strategy 1 is ill-advised ; when A observes 



him playing this, he will counter with 1 , leaving B with a loss of S25. Strategy 
3 is also rejected— it may cost him $18. He selects strategy 2; the most it 
can cost him is $10. Note that B calculates the maximum value of each 
column, and then selects the smallest of these. This minimum of the column 
maxima is called the "minimax." Note that in this special case minimax 
occurs at the same point as maximin, with a payoff of 10. In this game, 
A will play his strategy 2, and B will play his strategy 2; this is called a 
"strictly determined" game — because minimax and maximin coincide. 

This is illustrated in Figure 15-9, a diagram of the payoff matrix with 
each payoff measured vertically. At X we note a "saddle point," which is 



A's strategies 




FIG. 1 5-9 Payoff matrix in Table 15-11. 
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both the 



largest element in its column and the smallest element in its row. 



When such a saddle point exists, it is both maximin and minimax. 

Summary. ' In this strictly determined game, both A and B will play 
strategy 2. The payoff (from B to A) is always 10, so that it is clearly a game 
B will wfsh U play if he is bribed $12 to do so. 



PROBLEMS 



15-14 What is the appropriate strategy for each opponent, in the following 
gdmes ; in each case decide which player the game favors. 

<«!> 



(b) 



(c) 





1 


2 


B 

3 


4 


1 


3 


-1 


0 


-2 


A 2 


2 


2 


1 


3 


3 


-2 


0 


-1 


-3 



B 

1 2 3 



10 


-2 


10 


20 


-1 


5 



B 

1 2 



1 

A 2 
3 



-20 


-4 


-6 


-2 


1 


0 
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(b) Mixed Strategies 

Let us now try to apply the theory of part (a) to the following game: 



Table 15-12 
B 

1 2 


3 


3 


6 


2 


4- 




8 


Minimax 


Maximin 





A would select his strategy 2; this is the row with the largest minimum value 
(maximin = 4); at the same time B would select his strategy 1; this is the 
column with the smallest maximum value (minimax = 5). But now problems 
arise; because minimax and maximin do not coincide, there is no saddle 
point. Such a game is not strictly determined, with each playing only one 
strategy; it is easy to see why. B begins by playing column 1, while A plays 
row 2; the payoff is 5. Now B observes, that as long as A is playing row 2, 
he can do better by playing column 2, thus reducing the payoff to 4. But 
when B switches to column 2, it is now in A's interest to switch to row 1, 
raising the payoff to 6. As an exercise the student should confirm that a whole 
series of such moves and countermoves are set into play — eventually drawing 
the players in a circle around to the initial position. Then a new cycle begins. 

This will continue until the players recognize a fundamental point. Once 
a player allows his strategy to be predicted, he will be hurt. Thus, for example, 
when A's strategy becomes clear, he can be hurt by B. What is his defense? 

A's best plan is to keep B guessing. Thus if A determines his strategy by 
a chance process, B will be unable to predict what he will do. For example, 
A might toss a coin, playing row 1 if heads, or row 2 if tails. He is using a 
"mixed strategy," weighting each row with a probability of .50. Now B 
doesn't know what to expect; the only question left for A is whether this 
50/50 mix is the best set of odds to use. 

The best mix of strategies for A is determined in Figure 15-10. Along 
the horizontal axis we consider various probabilities that A may attach to 
playing row 1. This is all A has to select; once this is determined (e.g., if 
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If A sets 
p = &£wtli 
always play 2 
and ^2 = ^4 



If A sets p = x 4 
then V is max- 
imized at 4& 
regardless of 
whether £ plays 
lor 2 



If A sets 
p-^B will 
always play 1 
and Vi = 4 



0 h % % 1 

p = A's probabilities for playing row 1 

FIG. If -10 Determination of A's mixed strategy (for game shown in Table 15-12). 

jj 

A sets p = /3), then the probability attached to playing row 2 is also 
determined (1 - 1/3 = 2/3). 

Vertically, we plot the expected value of the game— which, of course, 
not only depbnds on the probabilities A may select, but also on what B may 
do. If B plays! only column 1 , then the expected value of the game is a function 
only of die probability A may select; this appears in this diagram as the line 
V v It is worfh examining in detail. 

At ' 
plays ro 
extreme 
A sets p 



the extreme left, if A sets p = 0 (i.e., never plays row 1, but always 
iw 2); then the value of the game V r is 5. On the other hand, at the 
righi, if A sets p = 1 (and always plays row 1), then \\ — 3- Or if 
=1/2, then 

V 1 = 3(1/2) + 5(1/2) = 4 (15-79) 
Generally, for any probability p that A may select: 

V 1 = 3(p) + 5(1 - p) = 5 - 2p (15-80) 

y 

The form o( (15-80) confirms that V x is a straight line function of p, the 
probability A selects. Similarly, if B plays only column 2, then 

F 2 = 6p + 4(1 -/>) = 4 + "2p -(15-81) 

Or, if B plays only column 3, then 

F 3 = 2/> + 8(1 --/;) = 8-6/; (15-82) 

These last two equations are also graphed in Figure 15-10. 
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The game is now laid out for A to analyze, with his problem being to 
select p. If he selects p = 1/8, his opponent will counter by always playing 
2, and keep the expected value of the game at 4 \. [This is shown geometrically, 
and confirmed by evaluating (15-81) setting/? — 1/8.] Or if A selects;? = 1/2, 
B will counter with 1, thus keeping the expected value of the game down to 
4. Since A is dealing with an opponent who will be selecting strategies to 
keep V low, the expected value of the game from A 9 s point of view is shown 
as the hatched line in Figure 15-10. The best A can do, therefore, is to select 
p = 1/4. This guarantees V = 4| ; moreover, note that this is the intersection 
of V 1 and V t . Thus this is the value of the game regardless of whether B 
plays 1 or 2. This geometric solution may be read from Figure 15-10, or 
determined algebraically by setting V x = V % \ using (15-80) and (15-81): 

5-2/? = 4 + 2/? (15-83) 

p = l (15-84) 

Finally, this value of p is substituted back into (15-81) for the value of the 
game: 

^2 = 4 + 2(1) = 4i (15-85) 

Thus A decides to attach a probability of 1/4 to playing row 1. How 
does he put this into practice? There are several possibilities; for example, 
he might toss 2 coins. If they both come up heads (probability 1/4), then he 
plays row 1 ; if not, he plays row 2. If this game is repeated many times, A 
will insure that he receives an average payoff which will tend towards 4| — and 
there is nothing B can do to reduce this. All B can hope for is that A has bad 
luck; (e.g., by the luck of the toss, A plays row 1 when B is playing column 1). 
This sort of bad luck can reduce A's average winning below A\ if the game is 
played only a few times (or A's good luck can raise his average winnings 
above 4 J); but as the game is played over and over, the element of bad luck 
tends to fade out. 



PROBLEMS 

=> 15-15 (a) Let's play a variation on matching coins. Each of us will choose 
heads or tails, independently and secretly. I'll pay you S30 if I show 
tails and you show heads. I'll pay you S10 if I show heads and you 
show tails. Finally, to make it fair, you pay me $20 if we match (i.e., 
both show heads, or both show tails). Do you want to play? Why? 
(b) What are the optimal strategies of the two players in an ordinary 
game of matching pennies? (Recall that in this game, one player gets 
the pennies if they match, the other gets them if they don't match.) 
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Would jyou still toss your penny in such a game, rather than secretly 
selecting a head or tail? Why? 

15-16 You find yourself on a long sea voyage. You wish to match pennies, 
but your companion wants to play cards. He therefore suggests a 
compromise. You choose heads or tails while he selects an ace. If 
you select a head he pays you $15, $4, $ — 5, and $1 respectively, 
depending on whether he's chosen the spade, heart, diamond or club 
ace. If you select a tail, he pays you $—10, $—2, $1, and $—5, again 
dependjng on which ace he's chosen. 

(a) Do; you agree to play? Why? What strategies? 

(b) ' If you were to play this game five times and found you had won 
$5,! what would you conclude? 

(c) Are there any two lines in your diagram that do not intersect? 
From tjoth the diagram and the payoff matrix, show that, no matter 
what tlje circumstances, it is always preferable for him to select the 
club ace instead of the heart ace (i.e., the heart strategy is "dominated 
by" the club strategy). By initially examining the payoff matrix, 
couldn t he have dropped the heart strategy from all further consid- 
eration ? 



(c) Conclusions 

In solving for the best game strategy, the first step is to test whether 
maximin |and^ minimax coincide. If they do, this is a strictly determined 
game, anp the single strategy to be used by each player is determined. 

If miniirjax and maximin do not coincide, the game is not strictly 
determined. Klixed strategies are called for, and are determined in simple 
cases geometrically or algebraically as we have illustrated. In more complex 
cases, more advanced mathematical techniques are required; but rather than 
extending the mechanical solution, it is more important to consider the 
fundamental philosophy and assumptions underlying game theory: 

1, A player using his best mixed strategy can guarantee a certain ex- 
pected value fpr the game, regardless of what his opponent may do. However, 
this is only the value towards which the average of many games will tend. 
If the game s only played a few times, luck may raise or lower this 
payoff. 

2. Once the optimal mixed strategy is determined (e.g., p = 1/2), the 
play is dictated by a random process (tossing a coin). It is simply not good 
enough to decide to play each strategy half the time — for example, alternating 



1 , then 2 



then 1, then 2, and so on. Once the opponent observes this pattern, 
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he can predict your next play, and hurt you. Note in the simplest game of 
matching pennies (Problem 5-1 5b), how badly a player would be hurt if he 
interchanged heads and tails rather than tossed the coin. Once an intelligent 
opponent observed this pattern, he could win every time. Each player must 
be unpredictable, by deciding his play be chance, 

3. The theory of games is a very conservative strategy. It is appropriate 
if a game is being replayed many times against an intelligent strategist, who 
is out to get you, knows the payoff matrix, and can observe your strategy 
mix. If these conditions are not met, chances are you can find a better 
strategy than game play. To illustrate, consider an extreme example. Your 
payoff is : 



Opponent 
1 2 



You 



84,000 


Si 


84 


82 



mmimax 
— maximin. 



Since maximin and minimax coincide, both you and your opponent should 
play strategy 2 every time. But on the first play, your opponent plays strategy 
1 ! This means either that he is a fool, or unaware of the payoff matrix (and 
that $4,000 debacle that he faces). It doesn't matter which; in these circum- 
stances you drop game strategy, play row 1 and punish your opponent for 
his stupidity or ignorance. 

Game theory also should not be used in games against nature. As an 
example, suppose you are trying to decide whether to hold a picnic indoors 
or outdoors. 



Nature 



You hold picnic: 





Rain 


Not Rain 




1 


2 


Indoors 1 


100 


0 


Outdoors 2 


0 


1,000 



Your profit depends on both where the picnic is held, and the weather. 
You can easily confirm that game theory means selecting p = 10/11, with an 



expected 
hold the 
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pre fit of just over $90. These odds mean that you will probably 
picji 



ic indoors. 

Sonjethipg clearly has gone wrong. An intuitive glance at the payoff 
matrix suggests you should go outdoors, providing there is a reasonable 
expectat on |hat it won't rain. Game strategy is inappropriate because it is 
based on thp false premise that nature is an opponent — determining the 
weather with the sole objective of ruining your picnic (i.e., minimizing V). 
Instead, nature's odds are determined independently; and let us suppose 
that the probability is 4/5 that it will not rain. With these odds, you should 
be holding the picnic outdoors, with an expected profit of: 



0(i) + 1000(f) = 800 



(15-86) 



The more complicated game solution is dead wrong in this case, because one 
of the key game theory assumptions (nature is intelligent and out to get you) 
simply does hot hold. The student will immediately see that the simpler 
solution (15-56) is required; and this of course is the Bayesian, or expected 
value, solution outlined in Section 15-2. 

In conclusion, if a prior distribution p(6) does not exist independently, 
but rathej* is determined by a hostile opponent, then game theory is appropri- 
ate; but even under these conditions it may be too conservative a line of 
play, unless ttie opponent is highly intelligent and informed. On the other 
hand, if <p(6y is determined independently (e.g., rain versus shine), then 
Bayesian [methods are required. 
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Table I Squares and Square Roots 



N 2 



1.0000 



1.00000 



1.0201 
'1.0404 
1.0609 

1,0816 
1.1025 
1.1236 

1.1449 
1.1664 
1.1881 



1,00499 
1.00995 
1.01489 

1.01980 
1.02470 
1,02956 

1.03441 
1.03923 
1.04403 



1.2100 



1.04881 



1.2321 
1.2544 
1.2769 

J. 2996 
1.3225 
1.3456 

1.3689 
1.3924 
1.4161 



1.05357 
1.05830 
1.06301 

1.06771 
1.07238 
1.07703 

1.08167 
1.08628 
1.09087 



1.4400 



1.09545 



1.4641 
1.4884 
1.5129 

1.5376 
1.5625 
1.5876 

1.6129 
1.6384 
1.6641 



1.10000 
1.10454 
1.10905 

1.11355 
1.11803 
1.12250 

1.12694 
1.13137 
1.Z3578 



1.6900 



1.14018 



1.7161 
1.7424 
1.7689 

1.7956 
1.8225 
1.8496 

1.8769 
1.9044 
1.9321 



1.14455 
1.14891 
1.15326 

1.15758 
1.16190 
1.16619 

1.17047 
1.17473 
1.17898 



1,9600 



1.18322 



1.9881 
2^0164 
2 0449 

20736 
2 1025 
2jl3l6 

2 1609 
2 1904 
2^201 

22500 
N 2 



1.18743 
1.19164 
1.19583 

1.20000 
1.20416 
1.20830 

1.21244 
1.21655 
1.22066 

1.22474 



VlON 



3.16228 



3.17805 
3.19374 
3.20936 

3.22490 
3.24037 
3,25576 

3.27109 
3.28634 
3.30151 



3.31662 



3.33167 
3.34664 
3.36155 

3.37639 
3.39116 
3.40588 

3.42053 
3.43511 
3.44964 



3.46410 



3.47851 
3.49285 
3,50714 

3.52136 
3.53553 
3.54965 

3.56371 
3.57771 
3.59166 



3.60555 



3.61939 
3.63318 
3.64692 

3.66060 
3.67423 
3.68782 

3.70135 
3.71484 
3.72827 



3.74166 



3.75500 
3.76829 
3.78153 

3.79473 
3.80789 
3.82099 

3.83406 
3.84708 
3.86005 

3.87298 

VlON 



N 


N 2 


VN 


VlON 


1.50 


2.2500 


1.22474 


3.87298 


1 El 
1*01 

1.52 
1.53 


2.3104 
2.3409 


1.23288 
1,23693 


t OOCC7 
0.0000/ 

3.89872 
3.91152 


1 CA 

i.54 
1.55 
1.56 


2.3716 
2.4025 
2.4336 


1.24499 
1.24900 


t no A OQ 
3.92428 

3.93700 

3.94968 


1.57 
1.58 
1.59 


2.4649 
2.4964 
2.52S1 


1.25300 
1.25698 
1.26095 


3.96232 
3.97492 
3.98748 


1.60 


2.5600 


1.26491 


4.00000 


i.bi 
1.62 
1.63 


O COOT 

2.6244 
2.6569 


1 .26886 
1.27279 
1.27671 


4.01248 
4.02492 
4.03733 


1. o4 
1.65 
1.66 


2.7225 
2.7556 


1.28452 
1,28841 


4.U4yoy 
4.06202 
4.07431 


1.67 
1.68 
1.69 


2.7889 
2.8224 
2.8561 


1.29228 
1.29615 
1.30000 


4.08656 
4.09878 
4.11096 


1.70 


2.8900 


1.30384 


4.12311 


J../ 1 
1.72 
1.73 


2.9584 
2.9929 


L .30/ o/ 
1.31149 
1.31529 


4.13521 
4.14729 
4.15933 


1 7 A 

J. .74 

1.75 
1.76 


O.Uz/o 
3.0625 
3.0976 


1.3iy09 
1.32288 
1.32665 


a ivi it 

4.1/133 

4.18330 
4.19524 


1.77 
1.78 
1.79 


3.1329 
3.1684 
3.2041 


1.33041 
1.33417 
1.33791 


4.20714 
4.21900 
4.23084 


1.80 


3.2400 


1.34164 


4.24264 


1 ft! 

1.82 
1.83 


3.3124 
3.3489 


J..o4o3b 
1.34907 
1.35277 


A OtZAAl 

4,zo441 
4.26615 
4.27785 


1.01 

1.85 
1.86 


o.oooo 
3.4225 
3.4596 


X .0004/ 

1.36015 
1.36382 


4.zoyo^ 
4.30116 
4.31277 


1.87 
1.88 
1.89 


3.4969 
3.5344 
3.5721 


1.36748 
1.37113 
1,37477 


4.32435 
4.33590 
4.34741 


1.90 


3.6100 


.1.37840 


4.35890 


1.91 
1.92 
1.93 


3.6481 
3.6864 
3.7249 


1.38203 
1.38564 
1.58924 


4.37035 
4.38178 
4.39318 


1 .94 
1.95 
1.96 


3.7636 
3.8025 
3.8416 


1 .3yJo4 
1.39642 
1.40000 


4.40454 
4.41588 
4.42719 


1.97 
1.98 
1.99 


3.8809 
3.9204 
3.9601 


1.40357 
1.40712 
1.41067 


4.43847 
4.44972 
4.46094 


2.00 


4.0000 


1.41421 


4.47214 


N 


N 2 


Vn 


VTon 
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TvT 

IN 


IN 


Vx 


V 10N 


2.00 


4.0000 


1.41421 


4.47214 


2.0i 
2.02 
2.03 


4.0401 
4.0804 
4.1209 


1.41774 
1.42127 
1.42478 


4.48330 
4.49444 
4.50555 


2.04 
2.05 
2.06 


4.1616 

4,2025 
4.2436 


1.42829 
1.43178 
1.43527 


4.51664 
4.52769 
4.53872 


2.07 
2.08 

9 HQ 


4.2849 
4.3264 


1.43875 
1.44222 

X .44 ODO 


4.54973 
4.56070 

4 .£>/ iOO 


2.10 


4.4100 


1.44914 


4.58258 


2.11 
2.12 
2.13 


4.4521 
4.4944 
4.5369 


1.45258 
1.45602 
1.45945 


4.59347 
4.60435 
4.61519 


2.14 
2.15 
2.16 


4.5796 
4.6225 
4.6656 


1.46287 
1.46629 
1.46969 


4.62601 
4.63681 
4.64758 


2.17 
2.18 

9 1 Q 

6. iy 


4.7089 
4.7524 
4.7961 


1.47309 
1.47648 

1.**/ JOO 


4.65833 
4.66905 

A CL7Q7A 

4.0/y/ 1 


2.20 


4.8400 


1.48324 


4.69042 


2.21 
2.22 
2.23 


4.8841 
4.9284 
4.9729 


1.48661 
1.48997 
1.49332 


4.70106 
4.71169 
4.72229 


2,24 
2.25 
2.26 


5.0176 
5.0625 
5.1076 


1.49666 
1.50000 
1.50333 


4.73286 
4.74342 
4.75395 


2.27 
2.28 

9 90 


5.1529 
5.1984 
5.2441 


1.50665 
1.50997 


4.76445 
4.77493 
oooy 


2.30 


5.2900 


1.51658 


4.79583 


2.31 
2.32 
2.33 


5.3361 
5.3824 
5.4289 


1.51987 
1.52315 
1.52643 


4.80625 
4.81664 
4.82701 


2,34 
2.35 
2.36 


5.4756 
5.5225 
5.5696 


1.52971 

1.53297 
1.53623 


4.83735 
4.84768 
4.85798 


2.37 
2.38 

Z.oy 


5.6169 
5.6644 
n 71 91 


1.53948 
1.54272 


4.86826 
4.87852 

4.0OO/O 


2.40 


5.7600 


1.54919 


4.89898 


2.41 
2.42 
2.43 


5.8081 
5.8564 
5.9049 


1.55242 
1.55563 
1,55885 


4.90918 
4.91935 
4.92950 


2.44 
2.45 

*> ACL 

/.4b 


5.9536 
6.0025 
o.Uolo 


1.56205 
1.56525 
1.56844 


4.93964 
4.94975 

A r\f tiO A 

4.95984 


O A7 

2.48 
2.49 


6.1504 
6.2001 


1.57162 
1.57480 
1.57797 


4,96991 
4,97996 
4.98999 


2.50 


6.2500 


1.58114 


5.00000 


N 


N 2 


V'N 


v'ToN 



(Continued) 



N 


IN* 


V IN 


A / 1 f\ XT 

V 1UJN 


2,50 


6.2500 


1.58114 


5.00000 


2.51 

2.52 
2.53 


6.3001 
6.3504 
6.4009 


1.58430 
1.58745 
1.59060 


5.00999 
5.01996 
5.02991 


2.54 
2.55 
2.56 


6.4516 
6.5025 
6.5536 


1.59374 
1.59687 
1 fiOOOO 


5.03984 
5.04975 


2.57 
2.58 

9 CQ 


6.6049 
6.6564 

£. 7 CIS 1 


1.60312 
1.60624 
1 .OUyoo 


5.06952 
5.07937 
o.uoyzu 


2.60 


6.7600 


1.61245 


5.09902 


2.61 

2.62 
2.63 


6.8121 
6.8644 
6.9169 


1.61555 
1.61864 
1.62173 


5.10882 
5.11859 
5.12835 


2.64 
2.65 
2.66 


6.9696 
7.0225 
7.0756 


1.62481 
1.62788 
1 63095 


5.13809 
5.14782 
5 15752 


2.67 
2.68 
2.69 


7.1289 
7.1824 

/ .ZOO L 


1.63401 
1.63707 

1 iZACW 1 

l.o4UIZ 


5.16720 
5.17687 
O.loooZ 


2.70 


7.2900 


1.64317 


5.19615 


2.71 
2.72 
2.73 


7.3441 
7,3984 
7.4529 


1.64621 
1.64924 
1.65227 


5.20577 
5.21536 
5.22494' 


2.V4 

2.75 
2.76 


7.5076 
7.5625 
7.6176 


1.65529 
1,65831 
1 66132 


5.23450 
5.24404 


2.77 
2.78 

9 7(1 


7.6729 
7.7284 

7 7RA1 


1.66433 
1.66733 

1 .D/UOO 


5,26308 
5,27257 


2.80 


7.8400 


1.67332 


5.29150 


2.81 
2.82 
2. S3 


7.8961 
7.9524 
8.0089 


1.67631 
1.67929 
1.68226 


5.30094 
5.31037 
5.31977 


2.84 
2.85 
2.86 


8.0656 
8.1225 
8.1796 


1.68523 
1.68819 
1.691 15 


5.32917 
5.33854 
5.34790 


2.87 
2.88 
2.89 


8.2369 
8.2944 
8.3521 


1.69411 

1.69706 
1 .70000 


5.35724 
5.36656 
5.37587 


2.90 


8.4100 


1.70294 


5.38516 


2.91 
2.92 
2.93 


8.4681 
8.5264 
8.5849 


1.70587 
1.70880 
1.71172 


5.39444 
5.40370 
5.41295 


2.94 
2.95 
z.yb 


8.6436 
8.7025 
8.7616 


1.71464 
1,71756 
1.72047 


5.42218 
5.43139 
5.44059 


2.97 

2.98 
2.99 


8.8209 
8.8804 
8.9401 


1.72337 
1.72627 
1.72916 


5.44977 
5.45894 
5.46809 


3,00 


9.0000 


1.73205 


5.47723 


N 


N 2 


v'n 


v Ton 
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Table T (Continued) 



N 2 



VN 



9.0000 



1.73205 



9.0601 
9.1204 
9.1809 

9.2416 
9.3025 
9.3636 

9.4249 
9.4864 
9.5481 

9.6100 



1.73494 
1.73781 
1.74069 

1.74356 
1.74642 
1.74929 

1.75214 
1.75499 
1.75784 



9.6721 
9.7344 
9.7969 

9.8596 
9.9225 
19.9856 

10.0489 
10.1124 
J0.1761 

10.2400 



10.3041 
10.3684 
10.4329 

10.4976 
10.5625 
10.6276 

10.6929 
10.7584 
10.8241 

10.8900 

10.9561 
11.0224 
11.0889 

11.1556 
11.2225 
11.2896 

11.3569 
11.4244 
11.4921 

11,5600 

11.6281 
11.6964 
11.7649 

11.8336 
11.9025 
11.9716 

12.0409 
112.1104 
12.1801 



;12.2500 

I N 2 



1.76068 



1.76352 
1.76635 
1,76918 

1,77200 
1.77482 
1.77764 

1.78045 
1.78326 
1.78606 



1.78885 



1.79165 
1.79444 
1.79722 

1.80000 
1.80278 
1.80555 

1.80831 
1.81108 
1.81384 



1.81659 



1.81934 
1.82209 
1.82483 

1.82757 
1.83030 
1.83303 

1.83576 
1.83848 
1.84120 



1.84391 



1.84662 
1.84932 
1.85203 

1.85472 
1.85742 
1.86011 

1.86279 
1.86548 
1.86815 



1.87083 

VN, 



VlON 



5A7725 



5.48635 
5.49545 
5.50454 

5.51362 
5.52268 
5.53173 

5.54076 

5.54977 
5.55878 



5.56776 



5.57674 
5.58570 
5.59464 

5.60357 
5.61249 
5.62139 

5.63028 
5.63915 
5.64801 



5.65685 



5.66569 
5.67450 
5.68331 

5.69210 
5.70Q88 
5:7t>964 

5.71839 
5.72713 
5.73585 



5.74456 



5.75326 
5.76194 
5.77062 

5.77927 
5.78792 
5.79655 

5.80517 
5.81378 
5.82237 



5.83095 



5.83952 
5.84808 
5.85662 

5.86515 
5.87367 
5.88218 

5.89067 
5.89915 
5.90762 



5.91608 



N 


N 2 


VN 


v'Ton 


3.50 


12.2500 


1.87083 


5.91608 


3.51 
3.52 
3.53 


12.3201 
12.3904 
12.4609 


1.87350 
1,87617 
1.87883 


5.92453 
5.93296 
5.94138 


3.54 
3.55 
3,56 


12.5316 
12.6025 
12.6736 


1.88149 
1.88414 
1.88680 


5.94979 
5.95819 
5.96657 


3.57 
3.58 
3.59 


12.7449 
12.8164 
12.8881 


1.88944 
1.89209 
1.89473 


5.97495 
5 98331 
5.99166 


3.60 


12.9600 


1.89737 


6.00000 


3.61 
3.62 
3.63 


13.0321 
13.1044 
13.1769 


1.90000 
1.90263 
1.90526 


6.00833 
6.01664 
6.02495 


3.64 

3.65 
3.66 


13.2496 
13.3225 
13.3956 


1.90788 
1.91050 
1.91311 


6.03324 
6.04152 
6.04979 


3.67 
3.68 
3.69 


13.4689 
13.5424 
13.6161 


1.91572 
1.91833 
1.92094 


6.05805 
6.06630 
6.07454 


3.70 


13.6900 


1.92354 


6.0&276 


3.71 

3.72 
3.73 


13.7641 
13.8384 
13.9129 


1.92614 
1.92873 
1.93132 


6.09098 
6.09918 
6.10737 


3.74 
3.75 
3.76 


13.9876 
14.0625 
14.1376 


1.93391 
1.93649 
1.93907 


6.11555 
6.12372 
6.13188 


3.77 
3 78 
3.79 


14.2129 
14.2884 
14.3641 


1.94165 
1.94422 
1.94679 


6.14003 
6.14817 
6.15630 


3.80 


14.4400 


1.94936 


6.16441 


3.81 
3.82 
3.83 


14.5161 
14.5924 
14.6689 


1.95192 
1.95448 
1.95704 


6.17252 
6.18061 
6.18870 


3.84 
3.85 
3.86 


14.7456 
14.8225 
14.8996 


1.95959 
1.96214 
1.96469 


6.19677 
6.20484 
6.21289 


3.87 
3.88 
3.89 


14.9769 
15 0544 
15.1321 


1.96723 
1.96977 
1.97231 


6.22093 
6.22896 
6.23699 


3.90 


15.2100 


1.97484 


6,24500 


3.91 
3.92 
3.93 


15.2881 
15.3664 
15.4449 


1.97737 
1.97990 
1.98242 


6.25300 
6.26099 
6.26897 


3.94 
3.95 
3.96 


15.5236 
15.6816 


1.98494 
1.98746 
L98997 


6.27694 

f. OQAQ(\ 

O.AOT! 

6.29285 


3.97 
3.98 
3.99 


15.7609 
15.8404 
15.9201 


1.99249 
1.99499 
1.99750 


6.30079 
6.30872 
6.31664 


4.00 


16.0000 


2.00000 


6.32456 


N 


N 2 


VN 


VToN 
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Table I (Continued) 



N 


N 2 


v'n 


vTon 


4.00 


16.0000 


2.00000 


6.32456 


4.01 
4*02 
4.03 


16 0S01 
16.1604 
16.2409 


2.00250 
2*00499 
2.00749 


ft W1A(\ 

6.34035 
6.34823 


4 04 

4*05 
4 0£ 


16 3216 
16.4025 

10. 4000 


9 00Q98 
2.01246 


6 7^10 
6.36396 
b.o/loi 


4.07 
4.08 
4.09 


16.5649 
16.6464 
16.7281 


2.01742 
2.01990 
2.02237 


6.37966 
6.38749 
6.39531 


4.10 


16.8100 


2.02485 


6.40312 


4 11 
4J2 
4.13 


16.8921 
16^9744 
17.0569 


9 097^1 

2.02978 
2.03224 


6.41872 
6.42651 


4 14 
4.15 

A 1 & 

4.10 


17 1396 
17^2225 
l/.OUoO 


2 03470 
2.03715 
Z.U0901 


6.44205 
6.44981 


4.17 
4.18 
4.19 


17.3889 
17.4724 
17.5561 


2.04206 
2.04450 
2.04695 


6.45755 
6.46529 
6.47302 


4.20 


17.6400 


2.04939 


6.48074 


4.21 
4*22 
4.23 


17,7241 
17.8084 
17.8929 


2.05183 
2*05426 
2.05670 


6.49615 
6.50384 


4 24 
4.25 

A 9£ 


17 9776 
18*0625 

xo.L*±i O 


2 05913 
2*06155 

9 

/.UOOVo 


6.51920 

*t C')itQ7 


4.27 
4.28 
4.29 


18.2329 
18.3184 
18.4041 


2.06640 
2.06882 
2.07123 


6.53452 
6.54217 
6.54981 


4.30 


18.4900 


2.07364 


6.55744 


4.31 
4*32 
4.33 


18.5761 
18*6624 
18.7489 


9 07^0^ 

2.07846 
2,08087 


O.OOJw 

6.57267 
6.58027 


4.34 

4*35 


J. O .OO^JJ 

18.9225 


9 0R797 
2.08567 
Z.UooUO 


6.59545 
6.60303 


4.37 
4.38 
4.39 


19.0969 
19.1844 
19.2721 


2.09045 
2.09284 
2.09523 


6.61060 
6.61816 
6.62571 


4.40 


19.3600 


2.09762 


6.63325 


4 41 
4A2 
4.43 


19 4481 
19*5364 
19,6249 


o ioooo 
2.10238 
2.10476 


c it407H 

6.64831 
6.65582 


4.44 
4*45 
4.46 


19.7136 
J 9*8025 
19.8916 


2.10713 
2*10950 
2.11187 


0.00000 

6.67083 
6.67832 


4.47 
4.48 
4.49 


19.9809 
20.0704 
20.1601 


2.11424 
2.11660 
2.11896 


6.68581 
6.69328 
6.70075 


4.50 


20.2500 


2.12132 


6.70820 


N 


N 2 


Vn 


VlON 



N 


N 2 


VN 


Vion 


4.50 


20.2500 


2.12132 


6.70820 


4.51 
4*52 
4.53 


20.4304 
20.5209 


9 1 27fifi 
2.12603 
2.12838 


6.72309 
6.73053 


4 54 
4*55 

4.00 


20 61 16 

20.7025 
zu. /yoo 


2 1 H073 
2.13307 

9 1 7Cil9 

Z.I oo4Z 


6 73795 
6>4537 

it 7C97D 

O./oZ/o 


4.57 
4.58 
4.59 


20.8849 
20.9764 
21.0681 


2.13776 
2.14009 
2.14243 


6.76018 
6.76757 
6.77495 


4.60 


21.1600 


2.14476 


6.78233 


4 61 

4*62 
4.63 


21 .2521 
2L3444 
21.4369 


2 14709 
2*14942 
2.15174 


6 78970 
6*79706 
6.80441 


4 64 
4.65 

A itft 
4.00 


21 5296 
21*6225 

01 VI Kit 

Zl./ J.00 


2.15407 
2J5639 
Z.loo/U 


6 81175 

6*81909 

b.oZ04Z 


4.67 
4.68 
4.69 


21.8089 
21.9024 
21.9961 


2.16102 

2.16333 
2.16564 


6.83374 
6.84105 
6.84836 


4.70 


22.0900 


2.16795 


6.85565 


4.71 
4J2 
4.73 


22.1841 
22*2784 
22.3729 


2.1 7025 
2.17256 
2.17486 


6.86294 
6*87023 
6.87750 


4 74 
4>5 
4,/o 


22 4676 
22*5625 
ZZ. ob/o 


2 17715 
2*17945 

O 1 Q 1 74 


6 88477 
6 - S9202 

it UQ098 


4.77 
4.78 
4.79 


22.7529 
22.8484 
22.9441 


2.18403 
2.18632 
2.18861 


6.90652 
6.91375 
6.92098 


4.80 


23.0400 


2.19089 


6.92820 


4 81 

4*82 
4.83 


97 1 ^fil 

23.2324 
23.3289 


2 1 9317 
2J9545 
2.19773 


f. qxc.a.9 
6.94262 
6.94982 


4.84 
4*85 

4.0O 


93 4956 
23.5225 
zo.oJ.yo 


2.20227 


6 95701 
6*96419 

o.y / JLO/ 


4.87 
4.88 
4.89 


23.7169 
23.8144 
23.9121 


2.20681 
2.20907 
2.21133 


6.97854 
6.98570 
6.99285 


4.90 


24.0100 


2.21359 


7.00000 


A Gl 

4.92 
4.93 


94 i nRi 
24.2064 
24.3049 


9 91 Cftc: 
Z.Z 1 000 

2.21811 
2,22036 


7 00714 
7*01427 
7.02140 


A Q4 

4.95 
4.96 


94 40^6 
24.5025 
24.6016 


2.22261 
2*22486 
2.22711 


7.03562 
7.04273 


4.97 
4.98 
4.99 


24.7009 
24.8004 
24.9001 


2.22935 
2.23159 
2.23383 


7.04982 
7.05691 
7.06399 


5.00 


25.0000 


2.236Q7 


7.07107 


N 


N 2 




Vion 
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Table I (Continued) 



N 2 




v'lON N 


X 2 


VN 


VlONT 


25.0000 


2.23607 


7.07107 


5.50 


30.2500 


2.34521 


7.41620 


95 1001 
252004 
25.5009 


2.23830 
224054 
2.24277 


7.07814 
7.08520 
7.09225 


5.51 
5.52 
5.53 


30.4704 
30.5809 


9 7A77A 

2.34947 
2.35160 


7 /4 9 90 4 

7.42967 
7.43640 


lot; a{\'\ f, 
25.5025 
25.6036 


2 24499 
2^24722 
2.24944 


7.10634 
7.11337 


^ za 
5.55 
5.56 


30.8025 
30.9156 


2.35584 
2,35797 


1 AAiyo 
/.44v3 

7.44983 
7.45654 


25.7049 
25.8064 
25.9081 


225167 
225389 
2.25610 


7.120o9 
7.12741 
7.13442 


5.57 
5.58 
5.59 


31.0249 
31.1364 
312481 


2.36008 
2.36220 
2.36432 


7.46324 
7.46994 
7.47665 


26.0100 


2.25832 




5.60 


31.3600 


2.36643 


7.48331 


26.1 1 21 
26,2 144 
26,3169 


2,26053 
226274 
2.26495 


7.14843 
7.15542 
7.16240 


5.61 
5.62 
5.63 


31.5844 
31.6969 


2.37065 
2.37276 


7 /fCOOO 

7.49667 
7.50335 


26.4196 

26^5225 
26.6256 


2 26716 
226936 
2.27156 


7.16938 
7J7635 
7.18331 


5.64 
5^65 
5.66 


0 i .0U70 

31.9225 
32.0556 


9 774S7 

2.37697 

O 7 AO 

2.3/908 


7 CAQOQ 

7.51665 
7.52330 


26.7289 
26.8324 
26.9361 


2.27376 
2.27596 
2.27816 


7.19027 
7.19722 
7.20417 


5.67 
5.68 
5.69 


32.1489 
32.2624 
32.3761 


2.38118 
2.38328 
2.38537 


7.52994 
7.53658 
7.54321 


27.0400 


2.28035 


7 91 1 1 A 


5.70 


32.4900 


2.38747 


7.54983 


27.1441 
272484 
27.3529 


2.28254 
228473 
2.28692 


7.21803 
7.22496 
7.23187 


5,71 
5.72 


32.7184 
32.8329 


Z.oaJoo 
2.39165 
2.39374 


7.55645 
7.56307 
7.56968 


27.4576 
27^5625 
27.6676 


2.28910 
2*.29129 

9 907/f7 


7 ?7Q7ft 

7.24569 
725259 


5.74 
575 
5.76 


79 O/17/C 

33.0625 
35.1776 


9 7QCQ7 

2.39792 
2.40000 


/.o/ozo 
7.58288 
7.58947 


27.7729 
27.8784 
27.9841 


2.29565 
2.29783 
2.30000 


7.25y4S 
7.26656 
7.27324 


5.77 
5.78 
5.79 


33.2929 
53.4084 
33.5241 


2.40208 
2.40416 
2.40624 


7.59605 
7.60263 
7.60920 


28.0900 


2.30217 


7,28011 


5.80 


33.6400 


2.40832 


7.61577 


28.1961 
^8!3024 
28.4089 


2.30454 
2^30651 
2.30868 


7.28697 
7.29383 
7.30068 


5.81 
5.82 
5.83 


35 7561 
35^8724 
33.9889 


2 4 1039 
2^41247 
2.41454 


7.62889 
7.63544 


28.6225 
28.7296 


2.31084 
2]31301 

2.01517 


7.30753 
7^31437 
7.32120 


5. 84 
5.85 
5.86 


34.2225 
34.3396 


Z.4I001 

2.41868 
2.42074 


7.64199 
7.64853 
7.65506 


28.8369 
28.9444 
59.0521 


2.31733 

2.31948 
2.32164 


7.32803 
7.33485 
7.34166 


5.87 
5.88 
5.89 


34.4569 
34.5744 
34.6921 


2.42281 
2.42487 
2.42693 


7.66159 
7.66812 
7.67463 


29.1600 


2.32379 






34.8100 


2.42899 


7.68115 


29.3764 
29.4849 


9 \9 ZQA 

2.32809 
2.33024 


7.35527 
7.56206 
7 36885 


5.91 
5.92 
5.93 


34,9281 
35.0464 
35.1649 


2.43105 
2.43311 
2.43516 


7.68765 
7.69415 
7.70065 


9Q KQZfl 

29,7025 
129.8116 


2.33452 
2.33666 


7,37564 
7.58241 
7.38918 


5.95 
5,96 


352836 
35.4025 
35,5216 


2.43721 
2.43926 
2.44131 


/ .70714 
7.71362 
7.72010 


29.9209 
50.0304 
30.1401 


2.33880 
2.34094 
2.34307 


7.39594 
7.40270 
7.40945 


5.97 
5.98 
5.99 


35.6409 
35.7604 
35.8801 


2.44336 
2.44540 
2.44745 


7.72658 
7.73305 
7.73951 


30,2500 


2.34521 


7.41620 


6.00 


36.0000 


2.44949 


7.74597 


; n 2 


VN 




N 


N 2 


Vn 


VioN" 
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Table I (Continued) 



NT 
IN 


N 2 




>/lON 


6.00 


36.0000 


2.44949 


7.74597 


6.01 
6.02 
6.03 


36.1201 
36.2404 
36.3609 


2.45153 
2.45357 
2.45561 


7.75242 
7.75887 
7.76531 


6.04 
6.05 
6.06 


36.4816 
36.6025 
36.7236 


2.45764 
2.45967 
2.46171 


7.77174 
7.77817 
7.78460 


6.07 
6.08 
6.09 


36.8449 
36.9664 
37.0881 


2.46374 
2.46577 
2.46779 


7.79102 
7.79744 
7.80385 


6.10 


37.2100 


2.46982 


7.81025 


6.11 
6.12 
6.13 


37.3321 
37.4544 
37.5769 


2.47184 
2.47386 
2.47588 


7.81665 
7.82304 
7.82943 


6.14 
6.15 
6.16 


37.6996 
37.8225 
37.9456 


2.47790 
2.47992 
2.48193 


7.83582 
7.84219 
7.84857 


6.17 
6.18 
6.19 


38.0689 
38.1924 
38.3161 


2.48395 
2.48596 
2.48797 


7.85493 
7.86130 
7.86766 


6.20 


38.4400 


2.48998 


7.87401 


6.21 
6.22 
6.23 


38.5641 
38.6884 
38.8129 


2.49199 
2.49399 
2,49600 


7.88036 
7.88670 
7.89303 


6.24 
6.25 
6.26 


38.9376 
39.0625 
39.1876 


2.49800 
2.50000 
2.50200 


7.89937 
7.90569 
7.91202 


6.27 
6.28 
6.29 


39.3129 
39.4384 
39.5641 


2.50400 
2.50599 
2.50799 


7.91833 
7.92465 
7.93095 


6.30 


39.6900 


2.50998- 


7.93725 


6.31 
6.32 
6.33 


39.8161 
39.9424 
40.0689 


2.51197 
2.51396 
2.51595 


7.94355 
7.94984 
7.95613 


6.34 
6.35 
6.36 


40.1956 
40.3225 
40.4496 


' 2.51794 
2.51992 
2.52190 


7.96241 
7.96869 
7,97496 


6.37 
6.38 
6.39 


40,5769 
40.7044 
40.8321 


2.52389 
2.52587 
2.52784 


7.98123 
7.98749 
7.99375 


6.40 


40.9600 


2.52982 


8.00000 


6.41 
6.42 
6.43 


41.0881 
41.2164 
41.3449 


2.53180 
2.53377 
2.53574 


8.00625 
8.01249 
8.01873 


6.44 
6.45 
6.46 


41.4736 
41.6025 
41.7316 


2.53772 
2.53969 
2.54165 


8.02496 
8.03119 
8.03741 


D.4/ 

6.48 
6.49 


~tj..i5buy 
41.9904 
42.1201 


2.54558 
2.54755 


8.04363 
8.04984 
8.05605 


6.50 


42.2500 


2.54951 


8.06226 


N 


N 2 


Vn 


v'Xon' 



N 


N 2 


VN 


vTon 


6.50 


42.2500 


2.54951 


8.06226 


6.51 
6.52 
6.53 


42.3801 
42.5104 
42.6409 


2.55147 
2.55343 
2.55539 


8.06846 
8.07465 
8.08084 


6.54 
6.55 
6.56 


42.7716 

42.9025 
43.0336 


2.55734 
2.55930 
2.56125 


8.08703 
8.09321 
8.09938 


6.57 
6.58 
6.59 


43.1649 
43.2964 
43.4281 


2.56320 
2.56515 
2.56710 


8.10555 
8.11172 
8.11788 


6.60 


43.5600 


2.56905 


8.12404 


6.61 
6.62 
6.63 


43.6921 
43.8244 
43.9569 


2.57099 
2.57294 
2.57488 


8.13019 
8.13634 
8.14248 


6.64 
6.65 
6.66 


44.0896 
44.2225 
44.3556 


2.57682 
2.57876 
2.58070 


8.14862 
8.15475 
8.16088 


6.67 
6.68 
6.69 


44.4889 
44.6224 
44.7561 


2.58263 
2.58457 
2.58650 


8.16701 
8.17313 
8.17924 


6.70 


44.8900 


2.58844 


8.18535 


6.71 
6.72 
6.73 


45.0241 
45.1584 
45.2929 


2.59037 
2.59230 
2.59422 


8.19146 
8.19756 
8.20366 


6.74 
6.75 
6.76 


45.4276 
45.5625 
45.6976 


2.59615 
2.59808 
2.60000 


8.20975 
8.21584 
8.22192 


6.77 
6.78 
6.79 


45.8329 
45.9684 
46.1041 


2.60192 
2.60384 
2.60576 


8.22800 
8.23408 
8.24015 


6.80 


46.2400 


2.60768 


8.24621 


6.81 
6.82 
6.83 


46.3761 
46.5124 
46.6489 


2.60960 
2.61151 
2.61343 


8.25227 
8.25833 
8.26438 


6.84 
6.85 
6.86 


46.7856 
46.9225 
47.0596 


2.61534 
2.61725 
2.61916 


8.27043 
8.27647 
8.28251 


6.87 
6.88 
6.89 


47.1969 
47.3344 
47.4721 


2.62107 

2.62298 
2,62488 


8.28855 
8.29458 
8.30060 


6.90 


47.6100 


2.62679 


8.30662 


6.91 
6.92 
6.93 


47.7481 
47.8864 
48.0249 


2.62869 
2.63059 
2.63249 


8.31264 
8.31865 
8.32466 


6.94 
6.95 
6.96 


48.1636 
48.3025 
48.4416 


2.63439 
2.63629 
2.63818 


8.33067 
8.33667 
8.34266 


Q7 
u.y / 

6.98 

6.99 


AH CftHQ 

48.7204 
48.8601 


2.64197 
2.64386 


o.o*toO{) 

8.35464 
8.36062 


7.00 


49.0000 


2.64575 


8,36660 


N 


N 2 


Vn 


VlON 
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Table I 


(Continued) 








N 




\ X 


VlON N 


N 2 


VN 


v'lOX 


- 


.00 


49.0000 


2.64575 


8.36660 


7.50 


56.2500 


2.73861 


8.66025 


7 
7 


^.02 
.03 


49.1401 
49.2804 
i 49.4209 


2.64764 
2.64953 
2.65141 


8.37257 
8.37854 
8.38451 


7.51 
7.52 
7.53 


56.4001 
56.5504 
56.7009 


2.74044 
2.74226 
2.74408 


8.66603 
8.67179 
8.67756 


7 

7 


.04 

05 
L06 


49.5616 
49.7025 
149.8436 


2.65330 
2.65518 
2.65707 


8.39047 
8.39643 
8.40238 


7.54 
7.55 
7.56 


56.8516 
57.0025 
57.1536 


2.74591 
2.74773 
2.74955 


8.68332 
8.68907 
8.69483 


707 
7 0S 
7,09 


49.9849 
50.1264 
50!2681 


2.65895 
2.66083 
2.66271 


8.40833 
8.41427 
8.42021 


7.57 
7.58 
7.59 


57.3049 

0/ .'■tOOt 

57.6081 


2.75136 
2.75518 
2.75500 


8.70057 
8.70632 
8.71206 


7.10 


50.4100 


2.66458 


8.42615 


7.60 


57.7600 


2.75681 


8.71780 


7 
7 
7 


•\ l 

.12 
.13 


50.5521 
50.6944 
50.8369 


2.66646 
2.66833 
2.67021 


8.43208 
8.43801 
8.44393 


7.61 

7.62 
7.63 


57.9121 
58.0644 
58.2169 


2.75S62 
2,76043 
2.76225 


872353 
8.72926 
8.73499 


7 

7 
7 


. .4 


50.9796 
51.1225 
51.2656 


2.67208 
2.67395 
2.67582 


8.44985 
8.45577 
8.46168 


7.64 
7.65 
7.66 


58.3696 
58.5225 
58.6756 


2.76405 
2.76586 
2.76767 


8.74071 
8.74643 
8.75214 


7.17 
7.S8 
7.ip 


51.4089 
51 5524 
5L6961 


2.67769 
2.67955 
2.68142 


8.46759 
8.47349 
8.47939 


7.67 
7.68 
7.69 


58.8289 

O O 1 Zf O i> £ 

59.1361 


2.76948 
2.77128 
2.77308 


8.75785 
8.76356 
8.76926 


7.2jo 


51.8400 


2.68328 


8.48528 


7.70 


59.2900 


2.77489 


8.77496 


7.2i 
7.22 
7.23 


51.9841 
62.1284 
52.2729 


2.68514 
2.68701 
2.68887 


8.49117 
8.49706 
8,50294 


7.71 
7.72 
7.73 


59.4441 
59.5984 
59.7529 


2.77669 
2.77849 
2.78029 


8.78066 
8.78635 
8.79204 


7.24 
7.25 
7.2$ 


52.4176 
52.5625 
52.7076 


2.69072 
2.69258 
2.69444 


8.50882 
8.51469 
8.52056 


7.74 
7.75 
7.76 


59.9076 
60.0625 
60.2176 


2.78209 
2.78388 
2.78568 


8.79773 
8.80341 
8.80909 


7.27 
7,28 
7.29 


52.8529 
52.9984 
53.1441 


2.69629 
2.69815 
2.70000 


8.52643 
8.53229 
8.53815 


7.77 
7.78" 
7.79 


60.3729 
60.6841 


2.78747 
2.78927 
2.79106 


8,81476 
8.82043 
8.82610 


7.30 


53.: 


2900 


2.70185 


8.54400 


7.80 


60.8400 


2.79285 


8.83176 


7.31 ! 
7.32* 
7.33 


514361 

5.15824 
53.7289 


2.70370 
2.70555 
2.70740 


8.54985 
8.55570 
8.56154 


7.81 
7.82 
7.83 


60.9961 
61.1524 
61,3089 


2.79464 
2.79643 
2.79821 


8.83742 
8.84308 
8.84873 


7.34 
7.35 
7.36 


5^.8756 
54-0225 
54;i696 


2.70924 
2.71109 
2.71293 


8.56738 
8.57321 
8.57904 


7.84 
7.85 
7.86 


61.4656 
61.6225 
61.7796 


2.80000 
2.80179 
2.80357 


8,85438 
8.86002 
8.86566 


7.37 
7.38 
7.39 


54)3169 
54|4644 
5416121 


2.71477 
2.71662 
2.71846 


8.58487 
8.59069 
8.59651 


7,87 
7.88 
7.89 


61.9369 
62.2521 


2.80535 
2.80713 
2.80891 


8.87130 
8.87694 
3.88257 


7.10 


547600 


2.72029 


8.60233 


7.90 


62.4100 


2.81069 


8.88S19 


7.41 
7.42 
7.43 ; 


54. 
55. 
55. 


>081 
)564 
1049 


2.72213 
2.72397 
2.72580 


8.60814 
8.61394 
8.61974 


7.9Z 
7.92 
7.93 


62.5681 
62.7264 
62.8849 


2.81247 
2.81425 
2.81603 


8.89382 
8.89944 
8.90505 


7.44 
7.45 
7.46 


55.' 
55. 

55.' 


5536 
5025 
5516 


2.72764 
2.72947 
2.73130 


8.62554 

O.DJl OHc 

8.63713 


7.94 
/ . Jo 
7.96 


63.0436 
63.3616 


2.81780 
2.81957 
2.82135 


8.91067 
8.91628 
8.92188 


7.47 : 

7.48 ! 

7.49 j 


55.; 

55. 
56. 


5009 
)504 
[001. 


2.73313 
2.73496 
2 .'73679 


8.64292 
8.64870 
8.65448 


7.97 
7.98 
7.99 


63.5209 
63.6804 
63.8401 


2.82312 
2.82489 
2.82666 


8.92749 
8.93308 
8.93868 


7.50 \ 


56. 


2500 


2.73861 


8.66025 


8.00 


64.0000 


2.82843 


8.94427 


N 






Vn 


vTon 


N 


N 2 


Vn 


VioN 
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Table I (Continued) 



N 


N 2 


v'n 


V ION 


8.00 


64.0000 


2.82843 


8.94427 


8.01 
8.02 
8.03 


64.1601 

64.5204 
64.4809 


2.83019 
2.83196 
2.S5575 


8.94986 
8.95545 
8.96103 


8.04 
8.05 
8.06 


64.6416 
64.8025 
64.9636 


2.83549 
2.83725 
2.83901 


8.96660 
8.97218 
8.97775 


8.07 
8.08 
8.09 


65 1249 
65.2864 
65.4481 


2.84077 

2.84253 
2.84429 


8.98332 
8.98888 
8.99444 


8.10 


65.6100 


2.84605 


9.00000 


8.11 
8.12 
8.13 


65.7721 
65.9344 
66.0969 


2.84781 
2.S4956 
2.S5132 


9.00555 
9.01110 
9.01665 


8.14 
8.15 
8.16 


66.2596 
66.4225 
66.5856 


2.85307 
2.85482 
2.85657 


9.02219 
9.02774 
9.03327 


8.17 
8.18 
8.19 


66.7489 
66.9124 
67.0761 


2.85S32 
2.86007 
2.86182 


9.03881 
9.04434 
9.04986 


8.20 


67.2400 


2.86556 


9.05539 


8.21 

8.22 
8.23 


67.4041 
67.5684 
67.7329 


2.86531 
2.86705 
2.86880 


9.06091 
9.06642 
9.07193 


8.24 
8.25 
8.26 


67.8976 
68.0625 
68.2276 


2.87054 
2.87228 
2.87402 


9.07744 
9.08295 
9.08845 


8.27 
8.28 
8.29 


68.3929 
68.5584 
68.7241 


2.87576 
2.87750 
2.87924 


9.09395 
9.09945 
9.10494 


8.30 


68.8900 


2.88097 


9.11045 


8.31 
8.32 
8.33 


69.0561 
69.2224 
69.3889 


2.88271 
2.88444 
2.88617 


9.11592 
9.12140 
9.12688 


8.34 
8.35 
8.36 


69.5556 
69.7225 
69.8896 


2.88791 

2.88964 
2.89137 


9.13236 
9.13783 
9.14330 


8.37 
8.38 
8.39 


70.0569 
70.2244 
70.5921 


2.89310 
2 89482 
2.89655 


9.14877 
9.15423 
9J5969 


8.40 


70.5600 


2.89828 


9.16515 


8.41 
8.42 
8.43 


70.7281 
70.8964 
71.0649 


2.90000 
2.90172 
2.90545 


9.17061 
9.17606 
9.18150 


8.44 

8:45 

8.46 


71.2336 
71.4025 
71.5716 


2.90517 
2.90689 
2.90861 


9.18695 
9.19239 
9.19783 


8.47 
8.48 
8.49 


71.7409 
71.9104 
72.0801 


2.91033 
2.91204 
2.91376 


9.20326 
9.20869 
9.21412 


8.50 


72.2500 


2.91548 


9.21954 


K 


N 2 


VX 


% Ton 



N 


N 2 


V'N 


VlON 


8.50 


72,2500 


2.91548 


9.21954 


8.51 
8.52 
8.53 


72.4201 
72.5904 
72.7609 


2.91719 
2.91890 
2.92062 


9.22497 
9.23038 
9.23580 


8.54 
8.55 
8.56 


72.9316 
75.1025 
73.2736 


2.92233 
2.92404 
2.92575 


9.24121 
.9.24662 
9.25203 


8.57 
8.58 
8.59 


73A449 
73.6164 
73.7881 


2.92746 
2.92916 
2.93087 


9.25743 
9.26283 
9.26823 


8.60 


73.9600 


2.93258 


9.27362 


8.61 
8.62 
8.65 


74.1521 
74.3044 
74.4769 


2.93428 
2.95598 
2.93769 


9.27901 
9.28440 
9.28978 


8.64 
8.65 
8.66 


74.6496 
74.8225 
74.9956 


2.93939 
2.94109 
2.94279 


9.29516 
9.30054 
9.30591 


8.67 
8.68 
8.69 


75.1689 
75.3424 
75.5161 


2.94449 
2.94618 
2.94788 


9.31128 
9.31665 
9.32202 


8.70 


75.6900 


2.94958 


9.32738 


8.71 
8.72 
8.73 


75.8641 
76.0584 
76.2129 


2.95127 
2.95296 
2.95466 


9,33274 
9.33809 
9.34345 


8.74 
8.75 
8.76 


76.3876 
76.5625 
76.7576 


2.95635 
2.95804 
2.95973 


9.34880 
9.35414 
9.35949 


8.77 
8.78 
8.79 


76.9129 
77,0884 
77.2641 


2.96142 
2.96311 
2.96479 


9.36483 
9.37017 
9.57550 


8.80 


77.4400 


2.96648 


9.38083 


8.81 
8.82 
8.83 


77.6161 
77.7924 
77.9689 


2.96816 
2.96985 
2.97153 


9.38616 
9.39149 
9.39681 


8.84 
8.85 
8.86 


78.1456 

78.3225 
78.4996 


2.97321 
2.97489 
2.97658 


9.40213 
9.40744 
9.41276 


8.87 

8.88 
8.89 


78.6769 
78 8544 
79.0321 


2.97825 
2.97993 
2]98161 


9.41807 
9.42338 
9.42868 


8.00 


79.2100 


2.98329 


9.43398 


8.91 
8.92 
8.93 


79.5881 
79.5664 
79.7449 


2.98496 
2.98664 
2.98831 


9.43928 
9.44458 
9.44987 


8.94 
8.95 
8.96 


79.9236 
80.1025 
80.2816 


2.98998 
2.99166 
2!99333 


9.45516 
9.46044 
9^46573 


8.97 
8.98 
8.99 


80.4609 
80.6404 
80.8201 


2.99500 
2.99666 
2.99833 


9.47101 
9.47629 
9.48156 


9.00 


81.0000 


3.00000 


9,48683 


K 


N* 


V'N 


VlON 
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Table I (Continued) 























K 


N 2 


\ X 


vTon 


N 


X 2 


\/'N 


J 


>.oo 


81.0000 


3.00000 


9.48683 


9. 50 


90,2500 


3.08221 


9.74679 


{ 


).01 


81.1801 


3.00167 


9.49210 


9.51 


90.4401 


3.08583 


9.75192 




).02 


81.3604 


3.00333 


9.49737 


9.52 


90.6504 


5.08545 


9.75705 




>.03 


01 r a aa 

81.5409 


1 a a r An 

3.0U500 


9.50263 


9.53 


yu.bzoy 


3.08707 


9.76217 


( 


>.04 


81.7216 


3.00666 


9.50789 


9.54 


91.0116 


3.08869 


9.76729 




).05 


81.9025 


3.00852 


9.51315 


9.55 


91 .2025 


3,09031 


9.77241 




>.06 


82.0836 


3.00998 


9.51840 


9.56 


9J .3956 


3.09192 


9.77753 




107 


82.2649 


3.01164 


9.52565 


9.57 


91.5849 


3.09354 


9.78264 




>.08 


§2.4464 


3.01330 


9.52890 


9.58 


9 1 .7764 


3 09516 


9.78775 




K09 


82*6281 


3*01496 


9*53415 


9*59 


91*9681 


3*09677 


9] 79285 




.10 


■ 

82.8100 


3.01662 


9.53939 


9. GO 


. .. ...... . 

92.1600 


3.09839 


9.79796 




UI 


82.9921 


3.01828 


9.54465 


9.61 


92,3521 


3.10000 


9.80506 


\ 


.12 


C7 1 7 A A 


3.01 993 


y.o4yo/ 


9.62 


92.5444 


3.1 01 61 


y ,oUo it) 


9 


.13 




6.\jZ toy 


9.55510 


y .00 


yj./ ooy 




y.oI^^D 


S 


.14 


83.5596 


3.02324 


9.56033 


9,64 


92.9296 


3.10485 


9.81835 


9.15 


83.7225 


3.02490 


9.56556 


9.65 


95.1225 


3.10644 


9.82344 


9.16 


83.9056 


3.02655 


9.57079 


9.66 


93.3156 


3.10805 


9.82853 


9.17 


84.0889 


3.02820 


9.57601 


9.67 


93.5089 


3.10966 


9.83362 


4.18 


84.2724 




9.58123 


9.68 


93.7024 


3.11127 


9 85870 


9.19 


84. 4 56 Z 


3.03150 


9.58645 


9*69 


93*8961 


3*11288 


9*84578 


8.20 


j 84.6400 


3.03315 


9.59166 


9.70 


94.0900 


3.11448 


9.S48S6 


9.21 


i 84.8241 


3.03480 


9.59687 


9.71 


94.2841 


3.11609 


9.85593 


9.22 


; q r f\(~\QA 
o5.UUo4 


3.03645 


A /"AOAO 


9.72 


94.4784 


3.11 769 


9.85901 


9.23 




1 A T O AA 




9.75 


94.0/ 29 


3.1 1929 


0 to r.iAO 


9.24 


■ 85.3776 


3.03974 


9.61249 


9.74 


94.8676 


3.12090 


9.86914 


9.25 


85.5625 


3.04138 


9.61769 


9.75 


95.0625 


3.12250 


9.87421 


3.26 

1 


; 85.7476 


3.04302 


9.62289 


9.76 


95,2576 


3.12410 


9.87927 


9.27 


85.9329 


3.04467 


9.62808 


9.77 


95.4529 


3.12570 


9.88433 


9.28 


86.1184 


3.04631 


0 63 7 I2S 


9.78 


95.6484 


3.1 2750 


9.88939 


9.29 


: 86*3041 


3*04795 


9.63846 


9*79 


95*8441 


3* 12890 


9189444 


9l30 


86.4900 


3.04959 


9.64365 


9.80 


96.0400 


3.15050 


9.89949 


c 


>:.3i 


; 86.6761 


3,05123 


9.64883 


9,81 


96.2561 


5.15209 


9.90454 


9.32 


[ 86.8624 


3.05287 


9.65401 


9.82 


96.4324 


3.13369 


9.909o9 


933 


; 87.0489 


3.05450 


9.65919 


9.83 


96.6289 


3.13528 


9.91464 


9.34 


.; 87.2356 


3.05614 


9.66457 


9,84 


96.8256 


3.13688 


9.91968 


9.35 


! 87.4225 


3.05778 


9.66954 


9 85 


97 0225 


3.13847 


9.92472 


936 


87.6096 


3.05941 


9.67471 


9.86 


97.2196 


3.14006 


9.92975 


937 


; 87.7969 


3.06105 


9.6798S 


9,87 


97,4169 


3.14166 


9.95479 


938 


87.9844 


3 On"*) 68 


y . vj 0 t 


9.88 


97.6144 


3, 14325 


9.95982 


9j.39 


\ 88.1721 


3.06431 


9.69020 


9*89 


97*8121 


3*14484 


9*94485 


9i40 


88.3600 


3.06594 


9.69536 


9*90 


98.0100 


3.14643 


9.94987 


9.41 


88.5481 


3.06757 


9.70052 


9.91 


98.2081 


3.14802 


9.95490 


9A2 


88.7364 


3.06920 


9.70567 


9.92 


98,4064 


3.14960 


9.95992 


9,43 


88.9249 


3.07083 


9.71082 


9.93 


98.6049 


3.15119 


9.96494 


9^44 


189.1136 


3.07246 


9.71597 


9.94 


98.8036 


3.15278 


9.96995 


9145 


! RQ 30? 5 


3.07409 


9.721 1 1 


9.95 






9 07497 


9146 


189.4916 


3.07571 


9*72625 


9.96 


99.2016 


3.15595 


9*97998 


9,47 


1 89.6809 


3.07734 


9.73139 


9.97 


99.4009 


3,15753 


9.98 490 


9.48 


1 89.8704 


3.07896 


9.73653 


9.98 


99.6004 


3.15911 


9.^8999 


9149 


| 90.0601 


3.08058 


9.74166 


9.99 


99.8001 


3.16070 


9.99500 


9:50 


90.2500 


3.08221 


9.74679 


to.oo 


100.000 


5.16228 


10.0000 






: N 2 


VN 




X 


X 2 


VX 


% iTix 
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Table lla Random Digits 



39 65 


76 


45 


45 


19 


90 


69 


64 


61 


20 


26 


36 


31 


62 


58 24 97 14 97 


95 06 70 99 00 


73 71 


23 


70 


90 


65 


97 


60 


12 


11 


31 


56 


34 


19 


19 


47 83 75 51 33 


30 62 38 20 46 


72 20 


47 


33 


84 


51 


67 


47 


97 


19 


98 


40 


07 


17 


66 


23 05 09 51 80 


59 78 11 52 49 


75 17 


25 


69 


17 


17 


95 


21 


78 


58 


24 


33 


45 


77 


48 


69 81 84 09 29 


93 22 70 45 80 


37 48 


79 


88 


74 


63 


52 


06 


34 


30 


01 


31 


60 


10 


27 


35 07 79 71 53 


28 99 52 01 41 


02 89 


08 


16 


94 


85 


53 


83 


29 


95 


56 


27 


09 


24 


43 


21 78 55 09 82 


72 61 88 73 61 


87 18 


15 


70 


07 


37 


79 


49 


12 


38 


48 


13 


93 


55 


96 


41 92 45 71 51 


09 18 25 58 94 


98 83 


71 


70 


15 


89 


09 


39 


59 


24 


00 


06 


41 


41 


20 


14 36 59 25 47 


54 45 n 24 89 


10 08 


58 


07 


04 


76 


62 


16 


48 


68 


58 


76 


17 


14 


86 


59 53 11 52 21 


66 04 18 72 87 


47 90 


56 


37 


31 


71 


82 


13 


50 


41 


27 


55 


10 


24 


92 


28 04 67 53 44 


95 23 00 84 47 


93 05 


31 


03 


07 


34 


18 


04 


52 


35 


74 


13 


39 


35 


22 


68 95 23 92 35 


36 63 70 35 33 


21 89 


11 


47 


99 


11 


20 


99 


45 


18 


76 


51 


94 


84 


86 


13 79 93 37 55 


98 16 04 41 67 


95 18 


94 


06 


97 


27 


37 


83 


28 


71 


79 


57 


95 


13 


91 


09 61 87 25 21 


56 20 11 32 44 


97 08 


31 


55 


73 


10 


65 


81 


92 


59 


77 


31 


61 


95 


46 


20 44 90 32 64 


26 99 76 75 63 


69 26 


88 


86 


13 


59 


71 


74 


17 


32 


48 


38 


75 


93 


29 


73 37 32 04 05 


60 82 29 20 25 


41 47 


10 


25 


03 


87 


63 


93 


95 


17 


81 


83 


83 


04 


49 


77 45 85 50 51 


79 88 01 97 30 


91 94 


14 


63 


62 


08 


61 


74 


51 


69 


92 


79 


43 


89 


79 


29 18 94 51 23 


14 85 11 47 23 


80 06 


54 


18 


47 


08 


52 


85 


08 


40 


48 


40 


35 


94 


22 


72 65 71 08 86 


50 03 42 99 36 


67 72 


77 


63 


99 


89 


85 


84 


46 


06 


64 


71 


06 


21 


66 


89 37 20 70 01 


61 65 70 22 12 


59 40 


24 


13 


75 


42 


29 


72 


23 


19 


06 


94 


76 


10 


03 


81 30 15 39 14 


81 83 17 16 33 



63 62 


06 


34 


41 


79 


53 


36 


02 


95 


94 


61 


09 


43 


62 


20 21 14 68 86 


84 95 48 46 45 


78 47 


23 


53 


90 


79 


93 


96 


38 


63 


34 


85 


52 


05 


09 


85 43 01 72 73 


14 93 87 81 40 


87 68 


62 


15 


43 


97 


48 


72 


66 


48 


53 


16 


71 


13 


81 


59 97 50 99 52 


24 62 20 42 31 


47 60 


92 


10 


77 


26 


97 


05 


73 


51 


88 


46 


38 


03 


58 


72 68 49 29 31 


75 70 16 08 24 


56 88 


87 


59 


41 


06 


87 


37 


78 


48 


65 


88 


69 


58 


39 


88 02 84 27 83 


85 81 56 39 38 


22 17 


68 


65 


84 


87 


02 


22 


57 


51 


68 


69 


80 


95 


44 


11.29 01 95 80 


49 34 35 86 47 


19 36 


27 


59 


46 


39 


77 


32 


77 


09 


79 


57 


92 


36 


59 


89 74 39 82 15 


08 58 94 34 74 


16 77 


23 


02 


77 


28 


06 


24 


25 


93 


22 


45 


44 


84 


11 


87 80 61 65 31 


09 71 91 74 25 


78 43 


76 


71 


61 


97 


67 


63 


99 


61 


80 


45 


67 


93 


82 


59 73 19 85 23 


53 33 65 97 21 


03 28 


28 


26 


08 


69 


30 


16 


09 


05 


53 


58 


47 


70 


93 


66 56 45 65 79 


45 56 20 19 47 


04 31 


17 


21 


56 


33 


73 


99 


19 


87 


26 


72 


39 


27 


67 


53 77 57 68 93 


60 61 97 22 61 


61 06 


98 


03 


91 


87 


14 


77 


43 


96 


43 


00 


65 


98 


50 


45 60 33 01 07 


98 99 46 50 47 


23 68 


35 


26 


00 


99 


53 


93 


61 


28 


52 


70 


05 


48 


34 


56 65 05 61 86 


90 92 10 70 80 


15 39 


25 


70 


99 


93 


86 


52 


77 


65 


15 


33 


59 


05 


28 


22 87 26 07 47 


86 96 98 29 06 


58 71 


96 


30 


24 


18 


46 


23 


34 


27 


85 


13 


99 


24 


44 


49 18 09 79 49 


74 16 32 23 02 


93 22 


53 


64 


39 


07 


10 


63 


76 


35 


87 


03 


04 


79 


88 


08 13 13 85 51 


55 34 57 72 69 


78 76 


58 


54 


74 


92 


38 


70 


96 


92 


52 


06 


79 


79 


45 


82 63 18 27 44 


69 66 92 19 09 


61 81 


31 


96 


82 


00 


57 


25 


60 


59 


46 


72 


60 


18 


77 


55 66 12 62 11 


08 99 55 64 57 


42 88 


07 


10 


05 


24 


98 


65 


63 


21 


47 


21 


61 


88 


32 


27 80 30 21 60 


10 92 35 36 12 


77 94 


30 


05 


39 


28 


10 


99 


00 


27 


12 


73 


73 


99 


12 


49 99 57 94 82 


96 88 57 17 91 
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Table 116 Random Normal Numbers, /y = 0, a = 1 



0.464 0 137 
0.060 -2 526 
1.486 —0-354 
1 - 0^472 
-0-555 



-0.899 
I. 63 



-1.787 I 
-0.105 | 
-1.339 | 
1.041 
0.279 

- 1.805 
-1.186 
0.658 
-0.439 
-1.399 

0.199 
0.159 
2.273 
0.041 
-1.132 

0.768 
0.375 
-0.513 
0.292 
1.026 

-1.334 
-0.287 
0.161 
-1.346 
-1.250 

0.630 
0.375 
- 1 .420 
-0.151 
-0.309 



-0.261 
-0.575 



4-0.5: 

1.9< 
0.489 
0.243 
0.531 



0.424 -Uo.444 

0.593 1 0.658 

0.862 -0.885 

0.235 -0.62?? 

-0.853 ! 0.402 



2.455 


— 0.323 


— 0.068 


0.296 


— 0.288 


1 .298 


0.241 


—0.957 


— 0.531 


—0.194 


0.543 


— 1.558 


0.187 


— 1.190 


0.022 


0.525 


— 0.634 


0.697 


0.926 


1.375 


0.785 


— 0.963 


—0.853 


— 1.865 


1.279 


3.521 


0.571 


— 1.851 


0.194 


1 .192 


—0.501 


— 0.273 


0.046 


0.321 


2.945 


1.974 


-0.258 


0.412 


0.439 


-0.035 


— 0.525 


0.595 


0.88 1 


— 0.934 


1 .579 


0.3 61 


— 1.885 


0.371 


0.007 


0.769 


0.971 


0.712 


1 .090 


— 0.631 


— 0.255 


-0.702 


— 0. 162 


— 0.1 36 


1 .033 


0.203 


0.448 


0.748 


— 0.423 


— 0.432 


— 1.618 


— 0.345 


— 0.51 I 


— 2.051 


— 0.457 


— 0.218 


0.857 


— 0.465 


0.378 


0.761 


0.181 


-0.736 


0.960 


-1.530 


-0,260 


0.120 


— 0.057 


— 1.229 


— 0.486 


0.856 


—0.491 


— 1.983 


— 2.830 


-0.238 


1.356 


— 0.561 


— 0.256 


— 0.212 


0.219 


0.779 


0.953 


— 0.869 


— 0.918 


3.598 


0.065 


0.415 


— 0.169 


0.313 


— 0.973 


-1.016 


0.012 


— 0.725 


1.147 


— 0.121 


1.096 


0.481 


— 1.691 


0.417 


-0.91 I 


1.231 


-0.199 


-0.246 


1.239 


-2.574 


-0.558 


0.056 


1.237 


1 .046 


— 0.508 


— 1 .630 


— 0. 146 


—0.392 


— 0.627 


0.561 


— 1 .384 


0.360 


— 0.992 


— 0.1 16 


— 1.698 


— 2.832 


— 1 .108 


— 2.357 


— 0.959 


0.424 


0.969 


— 1.141 


— 1 .041 


0.362 


— 1 .726 


1.956 


0.731 


1 .377 


0.983 


— 1.330 


1.620 


— 1 .040 


0.524 


— 0.281 


0.717 


-0.873 


- 1 .096 


-1.396 


1.047 


0.089 


—0.573 


0.932 


— 1 .633 


0.542 


0.250 


— 0.166 


0.032 


0.079 


0.471. 


— 1 .029 


1.114 


0.882 


1.265 


— 0.202 


0.151 


— 0,376 


— 0.310 


0.479 


1.151 


— 1.210 


— 0.927 


0.425 


0.290 


— 0.902 


0.610 


1,709 


— 1 .939 


0.891 


— 0.227 


0.602 


0.873 


— 0.437 


—0.220 


— 0.057 


0.3S5 


-0.649 


-0.577 


0.237 


-0.289 


0.513 


0.738 


-0.300 


— 1.083 


— 0.219 


— 0.291 


1.221 


1.1 19 


0.004 


-2.015 


— 0.594 


— 0.313 


0.084 


— 2.828 


—0.439 


— 0.792 


— 1.275 


— 0.623 


-1.047 


0.606 


— 0.747 


0.247 


1 .291 


0.063 


— 1.793 


— 0.699 


-1.347 


0.121 


0^790 


—0.584 


0.541 


0.484 


— 0.986 


0.481 


0.996 


0.921 


0,145 


0.446 


-1.661 


1.045 


-1.363 


-0.586 , 


-1.023 


— 1.473 


0.034 


— 2.127 


0.665 


0.084 


-0.880 


-0.579 


0.551 


—0.851 


0,234 


— 0.656 


0.340 


— 0.086 


— 0.158 


-0.120 


0.418 


0.210 


—0.736 


1.041 


0.008 


0.427 


— 0.831 


0,191 


0.074 


1 .266 


- 1.206 


— 0.899 


0,110 


— 0.528 


— 0.813 


0.071 


0.524 


-.0.574 


-0.491 


-I.I 14 


1.297 


-1.433 


-1.345 


-3.001 


0.479 


— 0.568 


— 0.109 


— 0.51 5 


-0.566 


2.923 


0,500 


0.359 ' 


0.326 


— 0.254 


0.574 


— 0.451 


— 1.181 


— 1 .190 


— 0.318 


—0.094 


1.1 14 


— 0.921 


— 0.509 


1.410 


— 0.518 


0.192 


— 0.432 


1.501 


1 .068 


— 1 ,202 


0.394 


— 1.045 


0.843 


0.942 


1 .045 


0.031 


0.772 


-0.288 


1.810 


1.378 


0.584 


1.216 


0.733 


0,402 


0.226 


0.782 


0.060 


0.499 


— 0.43 1 


1 .705 


1 . 164 


0,884 


— 0.298 


0.247 


— 0.491 


0.665 


— 0.1 35 


— 0.145 


— 0.498 


0.457 


1 .064 


— 1.71 1 


— 1 .186 


0.754 


—0.732 


— 0.066 


1 .006 


— 0.798 


0.162 


—0.430 


— 0.762 


0.298 


1 .049 


1 .810 


2.885 


— 0,768 


— 0.129 


0.416 


— {.541 


1.456 


2.040 


— 0.124 


0.196 


0.023 


-1.204 


0.593 


0.993 


-0.106 


0.116 


0.484 


-1.272 


1.066 


1.097 


-1.127 


-1.407 


-1.579 


-1.616 


1.458 


1.262 


0.736 


-0.916 


-0.142 


-0.504 


0.532 


1.381 


0.022 


-0.281 


-0.342 


1.222 


-0.023 


-0.463 


-0.899 


-0.394 


-0.538 


1.707 


-0.188 


-1.153 


0.777 


0.833 


0.410 


-0.349 


-1.094 


0.580 


1.395 


1.298 
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Table Ilia Binomial Coefficients 



n 


o 

\ f 


w 

\ / 


("\ 

2 


( n \ 

3 


W 


w 


w 


w 




w 


In \ 

wj 


0 
1 


I 


1 




















2 




2 


1 


















3 




3 


3 


1 
















4 


j 


4 


6 


4 


1 














5 


I 


5 


10 


10 


5 


1 












6 




6 


15 


20 


15 


6 


1 










7 

8 


I 
I 


7 
8 


21 

28 


35 
56 


35 
70 


21 
56 


7 

28 


1 

8 


v 1 






9 


I 


9 


36 


84 


126 


126 


84 


36 


9 


1 




10 


! 


10 


45 


120 


210 


252 


210 


120 


45 


10 


1 


11 




11 


55 


165 


330 


462 


462 


330 


165 


55 


11 


12 




12 


66 


220 


495 


792 


924 


792 


495 


220 


66 


13 




13 


78 


286 


715 


1287 


1716 


1716 


1287 


715 


286 


14 




14 


91 


364 


1001 


2002 


3003 


3432 


3003 


2002 


1001 


15 




15 


105 


455 


1365 


3003 


5005 


6435 


6435 


5005 


3003 


16 


1 


16 


120 


560 


1820 


4368 


8008 


11440 


12870 


11440 


8008 


17 




17 


136 


680 


2380 


6188 


12376 


19448 


24310 


24310 


19448 


18 




18 


153 


816 


3060 


8568 


18564 


31824 


43758 


48620 


43758 


19 




19 


171 


969 


3876 


11628 


27132 


50388 


75582 


92378 


92378 


20 




20 


190 


1140 


4845 


15504 


38760 


77520 


125970 


167960 


184756 


/" 

Note. 1 ^ 


\ n{fi - i)(n - 


2) ... 


(n - m + 1) 








For co- 


r 




- - 2) 


• ■ - 3.2 


1 







efficients missing from the above table, use the relation 

(:)-(,",)■ <*■ (nH 2 °)- ,67m 
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Table lllb Individual Binomial Probabilities p(x) 



.05 
9500 

;o5oc 



.1715 



.2036 



.6983 
.25|3 
.04(1)6 
:00k 
.0002 



.oocjo 

.0000 



.10 


.15 


.20 


7T 

.25 


,30 


.35 


.40 


.45 


.50. 


9000 


.8500 


.8000 


.7500 


,7000 


.6500 


.6000 


.5500 


.5000 


1000 


.1500 


.2000 


.2500 


.3000 


.3500 


.4000 


.4500 , 


.5000 


8100 


.7225 


.6400 


.5625 


.4900 


.4225 


.3600 


.3025 


.2500 


.1800 


.2550 


.3200 


.3750 


.4200 


.4550 


.4800 


.4950 


.5000 


.0100 


.0225 


.0400 


.0625 


.0900 


.1225 


.1600 


.2025 


.2500 


.7290 


.6141 


.5120 


.4219 


.3430 


.2746 


.2160 


.1664 


.1250 


.2430 


.3251 


.3840 


.4219 


.4410 


.4436 


.4320 


.4084 


.3750 


.0270 


.0574 


.0960 


.1406 


.1890 


.2389 


.2880 


.3341 


.3750 


.0010 


.0034 


.0080 


.0156 


.0270 


.0429 


.0640. 


.0911 


.1250 


.6561 


.5220 


.4096 


.3164 


.2401 


.1785 


,1296 


.0915 


.0625 


.2916 


.3685 


.4096 


.4219 


,4116 


.3845 


.3456 


.2995 


.2500 


.0486 


.0975 


.1536 


.2109 


,2646 


.3105 


.3456 


.3675 


.3750 


.0036 


.0115 


.0256 


.0469 


.0756 


.11 15 


.1536 


.2005 


.2500 


.0001 


.0005 


.0016 


.0039 


.0081 


.0150 


.0256 


.0410 


,0625 


.5905 


.4437 


.3277 


.2373 


.1681 


.1160 


.0778 


.0503 


.0312 


.3280 


.3915 


.4096 


.3955 


,3602 


.3124 


.2592^ 


.2059 


.1562 


.0729 


.1382 


.2048 


.2637 


.3087 


.3364 


_^3456 


.3369 


.3125 


.0081 


.0244 


.0512 


.0879 


.1323 


.1811 ' 


.~2304 


.2757 


.3125 


.0004 


.0022 


.0064 


.0146 


.0284 


.0488 


*0768 


.1128 


.1562 


.0000 


.0001 


.0003 


.0010 


.0024 


.0053 


.0102 


.0185 


.0312 


.5314 


.3771 


.2621 


.1780 


.1176 


.0754 


.0467 


.0277 


.0156 


.3543 


.3993 


.3932 


.3560 


.3025 


.2437 


.1866 


.1359 


.0938 


.0984 


.1762 


.2458 


.2966 


.3241 


.3280 


.3110 


.2780 


.2344 


.0146 


.0415 


.0819 


.1318 


.1852 


.2355 


,2765 


.3032 


Jl25 


j.0012 


.0055 


.0154 


.0330 


.0595 


.0951 


.1382 


.1861 


.2344 


\ 0001 


.0004 


.0015 


.0044 


.0102 


.0205 


.0369 


.0609 


.0938 


*0000 


.0000 


.0001 


.0002 


.0007 


.0018 


.0041 


.0083 


.0156 


4783 


.3206 


.2097 


.1335 


.0824 


.0490 


.0280 


.0152 


.0078 


3720 


.3960 


.3670 


.3115 


.2471 


.1848 


.1306 


.0872 


.0547 


1240 


.2097 


.2753 


.3115 


.3177 


.2985 


.2613 


.2140 


.1641 


0230 


.0617 


.1147 


.1730 


.2269 


.2679 


.2903 


.2918 


.2734 


.0026 


.0109 


.0287 


.0577 


.0972 


.1442 


.1935 


.2388 


.2734 


.0002 


.0012 


.0043 


.0115 


.0250 


.0466 


.0774 


.1172 


.1641 



)000 ,0001 
)000 .0000 



.0004 .0013 
.0000 .0001 



.0036 .0084 
.0002 .0006 



.0172 .0320 .0547 
.0016 .0037 .0078 



If it > .50, interchange u and (1 — n). 
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Table ll\b (Continued) 



n x 


,05 


.10 


.15 


.20 


.25 


7T 

.30 


.35 


.40 


.45 


.50 


8 0 


.6634 


.4305 


.2725 


.1678 


.1001 


.0576 


.0319 


.0168 


.0084 


.0039 


1 


.2793 


.3826 


.3847 


.3355 


.2670 


.1977 


.1373 


.0896 


.0548 


.0312 


2 


.0515 


.1488 


.2376 


.2936 


.3115 


.2965 


,2587 


.2090 


.1569 


.1094 


3 


.0054 


.0331 


.0839 


.1468 


.2076 


.2541 


,2786 


.2787 


.2568 


.2188 


4 


.0004 


.0046 


.0185 


.0459 


.0865 


.1361 


.1875 


.2322 


.2627 


.2734 


5 


.0000 


.0004 


.0026 


.0092 


.0231 


.0467 


.0808 


.1239 


.1719 


.2188 


6 


.0000 


.0000 


,0002 


.0011 


.0038 


.0100 


.0217 


.0413 


.0703 


.1094 


7 


.0000 


.0000 


.0000 


.0001 


.0004 


,0012 


.0033 


.0079 


.0164 


.0312 


8 


.0000 


.0000 


.0000 


.0000 


.0000 


.0001 


.0002 


.0007 


.0017 


.0039 


9 0 


.6302 


.3874 


.2316 


.1342 


.0751 


.0404 


.0207 


.0101 


.0046 


.0020 


1 


.2985 


.3874 


.3679 


.3020 


.2253 


.1556 


.1004 


.0605 


.0339 


.0176 


2 


.0629 


.1722 


.2597 


.3020 


.3003 


.2668 


.2162 


.1612 


.1110 


.0703 


3 


.0077 


.0446 


.1069 


.1762 


.2336 


;2668 


.2716 


.2508 


.2119 


.1641 


4 


.0006 


.0074 


.0283 


.0661 


.1168 


.1715 


.2194 


.2508 


.2600 


.2461 


5 


.0000 


.0008 


.0050 


.0165 


.0389 


.0735 


.1181 


.1672 


.2128 


.2461 


6 


.0000 


.0001 


.0006 


.0028 


.0087 


.0210 


.0424 


.0743 


.1160 


.1641 


7 


.0000 


.0000 


.0000 


.0003 


.0012 


.0039 


.0098 


.0212 


.0407 


.0703 


8 


.0000 


.0000 


.0000 


.0000 


.0001 


.0004 


.0013 


.0035 


.0083 


.0176 


9 


.0000 


.0000 


.0000 


.0000 


.0000 


.0000 


.0001 


.0003 


.0008 


.0020 


10 0 


.5987 


.3487 


.1969 


.1074 


,0563 


.0282 


.0135 


.0060 


.0025 


.0010 


1 


.3151 


.3874 


.3474 


.2684 


.1877 


.1211 


.0725 


.0403 


.0207 


.0098 


2 


.0746 


.1937 


.2759 


.3020 


.2816 


.2335 


.1757 


.1209 


.0763 


.0439 


3 


.0105 


.0574 


.1298 


.2013 


,2503 


.2668 


.2522 


.2150 


.1665 


.1L72 


4 


.0010 


.0112 


.0401 


.0881 


.1460 


.2001 


.2377 


,2508 


.2384 


.2051 


5 


.0001 


.0015 


.0085 


.0264 


.0584 


.1029 


.1536 


.2007 


.2340 


.2461 


6 


.0000 


.0001 


.0012 


.0055 


,0162 


.0368 


.0689 


.1115 


.1596 


.2051 


7 


.0000 


.0000 


.0001 


.0008 


.0031 


.0090 


.0212 


.0425 


.0746 


.1 172 


8 


,0000 


.0000 


.0000 


.0001 


.0004 


.0014 


.0043 


.0106 


.0229 


.0439 


9 


.0000 


.0000 


.0000 


.0000 


.0000 


.0001 


.0005 


.0016 


.0042 


.0098 


10 


.0000 


.0000 


.0000 


.0000 


.0000 


.0000 


.0000 


.0001 


.0003 


.0010 
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Tab; 




e IIIc Cumulative Binomial Probabilities in Right-hand Tail 



.05 



JO 


.15 


.20 


.25 


.30 


.35 


.40 


.45 


.50 


.1900 


.2775 


.3600 


.4375 


.5100 


.5775 


.6400 


.6975 


.7500 


,0100 


.0225 


.0400 


.0625 


.0900 


.1225 


.1600 


.2025 


.2500 


.2710 


.3859 


.4880 


.5781 


.6570 


.7254 


.7840 


.8336; 


.8750 


.0280 


.0608 


.1040 


.1562 


.2160 


.2818 


.3520 


.42521; 


.5000 


.0010 


.0034 


.0080 


.0156 


.0270 


.0429 


.0640 


.0911 


.1250 


.3439 


.4780 


.5904 


.6836 


.7599 


.8215 


.8704 


.9085 


.9375 


.0523 


.1095 


.1808 


.2617 


,3483 


.4370 


.5248 


.6090 


.6875 


.0037 


.0120 


.0272 


.0508 


.0837 


.1265 


.1792 


.2415 


.3125 


.0001 


.0005 


.0016 


.0039 


.0081 


.0150 


.0256 


.0410 


.0625 


.4095 


.5563 


.6723 


.7627 


.8319 


.8840 


,9222 


.9497 !i 


.9688 


.0815 


.1648 


.2627 


.3672 


.4718 


.5716 


.6630 


.7438" 


.8125 


.0086 


.0266 


.0579 


.1035 


.1631 


.2352 


.3174 


.4069 


.5000 


.0005 


.0022 


.0067 


.0156 


.0308 


.0540 


.0870 


.1312; 


.1875 


.0000 


.0001 


.0003 


.0010 


,0024 


.0053 


.0102 


.0185 


.0312 


.4686 


.6229 


.7379 


.8220 


.8824 


.9246 


.9533 


.9723 I 


.9844 


.1143 


.2235 


.3447 


.4661 


.5798 


.6809 


.7667 


.8364 


.8906 


.0158 


.0473 


.0989 


.1694 


.2557 


.3529 


.4557 


.5585 


.6562 


.0013 


.0059 


.0170 


.0376 


.0705 


.1174 


.1792 


.2553 


,3438 


.0001 


.0004 


.0016 


.0046 


.0109 


.0223 


.0410 


.0692 


.1094 


.0000 


.0000 


.0001 


.0002 


.0007 


.0018 


.0041 


.0083 


.0156 


.5217 


.6794 


.7903 


.8665 


.9176 


.9510 


.9720 


.9848 


.9922 


.1497 


.2834 


.4233 


.5551 


.6706 


.7662 


.8414 


.8976 ; 


.9375 


.0257 


.0738 


.1480 


.2436 


.3529 


.4677 


.5801 


.6836 


.7734 


.0027 


.0121 


.0333 


.0706 


.1260 


.1998 


.2898 


.3917 


.5000 


.0002 


.0012 


.0047 


.0129 


.0288 


.0556 


.0963 


.1529 


.2266 


.0000 


,0001 


.0004 


.0013 


.0038 


.0090 


.0188 


.0357 


.0625 


.0000 


.0000 


.0000 


.0001 


.0002 


.0006 


.0016 


.0037 


.0078 

i 



2 1 

2 



2 
3 

1 

2 

3 
4 

1 

2 
3 
4 
5 

1 

2 
3 
4 
5 



7 1 
2 
3 
4 
5 

6 
7 



.09 
.0025 

.14^6 
.00^2 

.oodi 

i 

1.18*5 
| .0140 
1 .0005 
.0000 

| .226?2 
1 .0226 

I- 001 ? 

.ooop 
Looop 

i 

.264^ 
.032S 
.002* 
,000 

l.oool 



.0002 

loooc 
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Table II Ic (Continued) 



n x 0 


.05 


.10 


.15 


.20 


.25 


.30 


.35 


.40 


.45 


.50 


8 I 


.3366 


.5695 


.7275 


.8322 


.8999 


.9424 


.9681 


.9832 


.9916 


.9961 


2 


.0572 


.1869 


.3428 


.4967 


.6329 


.7447 


.8309 


.8936 


.9368 


.9648 


3 


.0058 


.0381 


.1052 


.2031 


.3215 


.4482 


.5722 


.6846 


.7799 


.8555 


4 


.0004 


.0050 


.0214 


.0563 


.1138 


.1941 


.2936 


.4059 


,5230 


.6367 


5 


.0000 


.0004 


.0029 


.0104 


.0273 


.0580 


.1061 


.1737 


.2604 


.3633 



6 


.0000 


.0000 


.0002 


.0012 


.0042 


.0113 


.0253 


.0498 


.0885 


.1445 


7 


.0000 


.0000 


.0000 


,0001 


.0004 


.0013 


.0036 


.0085 


.0181 


.0352 


8 


.0000 


.0000 


.0000 


.0000 


.0000 


.0001 


.0002 


.0007 


.0017 


.0039 



V 



9 1 


.3698 


.6126 


.7684 


.8658 


.9249 


.9596 


.9793 


.9899 


.9954 


.9980 


2 


.0712 


.2252 


.4005 


.5638 


.6997 


.8040 


.8789 


.9295 


.9615 


.9805 


3 


.0084 


.0530 


.1409 


.2618 


.3993 


.5372 


.6627 


,7682 


.8505 


.9102 


4 


.0006 


.0083 


.0339 


.0856 


.1657 


.2703 


.3911 


,5174 


.6386 


.7461 


5 


.0000 


.0009 


.0056 


.0196 


.0489 


.0988 


.1717 


.2666 


.3786 


.5000 


6 


.0000 


.0001 


.0006 


.0031 


.0100 


.0253 


.0536 


.0994 


.1658 


.2539 


7 


.0000 


.0000 


.0000 


.0003 


.0013 


.0043 


.0112 


,0250 


.0498 


.0898 


8 


.0000 


.0000 


.0000 


.0000 


.0001 


.0004 


.0014 


.0038 


.0091 


.0195 


9 


.0000 


.0000 


.0000 


.0000 


.0000 


.0000 


.0001 


.0003 


.0008 


.0020 


10 1 


.4013 


.6513 


.8031 


.8926 


.9437 


.9718 


.9865 


.9940 


.9975 


.9990 


2 


.0861 


.2639 


.4557 


,6242 


.7560 


.8507 


.9140 


.9536 


.9767 


.9893 


3 


.0115 


.0702 


.1798 


.3222 


.4744 


.6172 


.7384 


.8327 


.9004 


.9453 


4 


.0010 


.0128 


.0500 


.1209 


.2241 


.3504 


.4862 


.6177 


.7340 


.8281 


5 


.0001 


.0016 


.0099 


.0328 


.0781 


.1503 


.2485 


.3669 


.4956 


.6230 


6 


.0000 


.0001 


.0014 


.0064 


.0197 


.0473 


.0949 


.1662 


.2616 


.3770 


7 


.0000 


.0000 


.0001 


,0009 


.0035 


,0106 


.0260 


.0548 


.1020 


.1719 


8 


.0000 


.0000 


.0000 


.0001 


.0004 


.0016 


.0048 


.0123 


.0274 


.0547 


9 


.0000 


.0000 


.0000 


.0000 


.0000 


.0001 


.0005 


.0017 


.0045 


.0107 


10 


.0000 


.0000 


.0000 


.0000 


.0000 


.0000 


,0000 


.0001 


.0003 


,0010 
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Table IV Areas for a Standard Normal Distribution 



An entry in the table is the area under the 
curve, between z = 0 and a positive value of z. 
Areas for negative values of z are obtained 
by symmetry. 




Area = Probability 



Second Decimal Place of z 



[z 


'.00 




.01 


.02 


.03 


.04 


.05 


.06 


.07 


.08 


.09 


o n 


.0000! 


0040 


.0080 


.0120 


.0160 


.0199 


.0239 


,0279 


.0319 


.0359 


0 1 


.0398: 


0418 


.0478 


0517 


.0557 


.0596 


.0636 


067 5 


071 4 

. \J 1 14 


.0753 


o i 


■0793, 


0817 


.0871 


.0910 


.0948 


OQ87 


1076 


1064 


\ 1 01 


1 141 


0 1 


.1179* 


.1217 


.1255 


.1293 


.1331 


.1368 


1406 


1441 


.1480 


.1517 


0.4 


.1554 

i 


1 


.1591 


.1628 


.1664 


.1700 


.1736 


.1772 


.1808 


.1844 


.1879 


0 5 

v. J 


.j915 




1 Q50 


.1985 


.2019 


.2054 


2088 


.2123 


.2157 


2190 


7774 


n 6 


.2257 


77Q1 


.2324 


.^jj 1 


7180 


7477 


7454 


7486 


7517 


754Q 


n 7 


i580; 


761 \ 
.£\> i i 


.2642 


.2673 


7703 


.2734 


.2764 


77Q4 


7871 


.2852 


o 8 


.2881 


7Q10 


.2939 


.2967 




1071 


.3051 


1078 


1106 


11 11 


0.9 


Jl59 

i ■ 


.3186 


.3212 


.3238 


.3264 


.3289 


.3315 


.3340 


.3365 


.3389 


1 .V/ 


1 1 
.3413 


\ 


1418 


.3461 


.3485 


.3508 


.3531 


.3554 


1577 


1599 


.3621 


1 1 
i . i 


.3,643 


1 


1665 


1686 


3708 


17^Q 


1749 


1770 


1790 


1810 


.3830 


1 7 


.3849 ; 


186Q 


.3888 


.3907 


.3925 


.3944 


.3962 


.3980 


.3997 


.4015 


i V' 


.4032 ; 




4066 


ADR? 


40QQ 

.TV// 


41 1 5 


41 ^1 
.4 1 j i 


4147 

.41 4 / 


.41 


4177 


1.4 


.4192 ; 


.4207 


.4222 


.4236 


.4251 


.4265 


.4279 


.4292 


.4306 


.4319 


1 c 

1 


.4332 j 


A1AS 


4157 

.4 J J / 


4170 


4187 

.43 OZ- 


41Q4 


4406 


441 8 

.441 O 


447Q 


AAA 1 
.4441 


1.0 


.4452 : 


/M61 


AA1A 
.44 / 4 


AAQA 
.44(54 


A A Q'x 
.447J 


4505 


451 5 


457 s 


4515 
,4J 0 -J 


4545 


1 *7 


.4554 : 


.4304 


4571 
,4J / j 


4587 


45<Ji 
.4j yi 


45QQ 


4608 


4616 
.40 lu 


4675 


4611 


1 Q 
l.O 


.4ki 


,404V 


46^6 


4664 
.4D04 


4671 

.40 / Jl 


4678 
.40 / 0 


.4000 


46Q1 


46QQ 


4706 


1.9 


.4713 


.4719 


.4726 


.4732 


.4738 


.4744 


.4750 


.4756 


.4761 


.4767 


2,0 


1 

.4)72 


.4778 


.4783 


.4788 


.4793 


.4798 


.4803 


.4808 


.4812 


.4817 


2.1 


.4821 


.4826 


.4830 


.4834 


.4838 


.4842 


.4846 


.4850 


.4854 


.4857 


2.2 


.4861 


.4864 


.4868 


.4871 


.4875 


.4878 


.4881 


.4884 


.4887 


.4890 


2.3 


.4893 


.4896 


.4898 


.4901 


.4904 


.4906 


.4909 


.4911 


.4913 


.4916 


2.4 


.4^18 

1 


| .4920 


.4922 


.4925 


.4927 


.4929 


.4931 


.4932 


.4934 


.4936 


2.5 


,4^38 


j .4940 


.4941 


.4943 


.4945 


.4946 


.4948 


.4949 


.4951 


.4952 


2.6 


.49*53 


\ .4955 


.4956 


.4957 


.4959 


.4960 


.4961 


.4962 


.4963 


.4964 


2.7 


.4^65 


! .4966 


.4967 


.4968 


.4969 


.4970 


.4971 


.4972 


.4973 


.4974 


2.8 


.4974 


! .4975 


.4976 


.4977 


.4977 


.4978 


.4979 


.4979 


.4980 


.4981 


2.9 


.4981 


j .4982 


.4982 


.4983 


.4984 


.4984 


.4985 


.4985 


.4986 


.4986 


3.0 




; .4987 


.4987 


.4988 


.4988 


.4989 


.4989 


.4989 


.4990 


.4990 
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Table V Student's t Critical Points 




0 A 
Critical 
point 



^\ Pr 
d.f.\ 


.10 


.05 


.025 


.01 


.005 


1 


3.078 


6.314 


12.706 


31.821 


63.657 


i 

Z 


l.ooo 


z.yzo 


A 1 A 1 

4.303 


t.yOj 


A nic 


3 


1.638 


2.353 


3.182 


4.541 


5.841 


4 


1.533 


2.132 


2.776 


3.747 


4.604 


5 


1.476 


2.015 


2.571 


3.365 


4.032 


6 


1.440 


1.943 


4447 


3.143 


3.707 


n 

I 


1 A 1 ^ 

1.41 j 


1 QCK 

I.oVj 


Z.ioj 


Z.yyo 


1 AQQ 


8 


1.397 


1.860 s 


2.306 


2.896 


3.355 


9 


1.383 


1.833 


2.262 


2.821 


3.250 


10 


1.372 


1.812 


2.228 


2.764 


3.169 


11 


1.363 


1.796 


2.201 


2.718 


3.106 


1 o 

IZ 


l.ijo 


1 "701 

1. foZ 


Z. 1 fy 


Z.DO I 




13 


1.350 


1.771 


2.160 


2.650 


3.012 


14 


1,345 


1.761 


2.145 


2.624 


2.977 


15 


1.341 


1,753 


2.131 


2.602 


2.947 


16 


1337 


1.746 


2,120 


2.583 


2.921 


17 


1.333 


1.740 


1 1 1 A 

2.1 10 


2.567 


1 O AO 

2.898 


18 


1.330 


1.734 


2.101 


2.552 


2.878 


19 


1.328 


1.729 


2.093 


2.539 


2.861 


20 


1.325 


1.725 


2.086 


2.528 


2.845 


21 


1.323 


1.721 


2.080 


2.518 


2.831 


i 
22 


1 111 
1.321 


1.717 


2.074 


2.508 


1 O 1 A 

2.819 


23 


1.319 


1.714 


2.069 


2.500 


2.807 


24 


1.318 


1.711 


2.064 


2.492 


2.797 


25 


1.316 


1.708 


2.060 


2,485 


2.787 


Zo 


1 IK 

1.31 j 


1 "7 A/~ 

1. /Uo 


Z.loo 


ZA/y 


z. / /y 


27 


1.314 


1.703 


2.052 


2.473 


2.771 


28 


1.313 


1.701 


2.048 


2.467 


2.763 


29 


1.311 


1.699 


2.045 


2.462 


2.756 


30 


1.310 


1.697 


2.042 


2.457 


2.750 


40 


1.303 


1.684 


2.021 


2.423 


2.704 


60 


1.296 


1.671 


2.000 


2.390 


2.660 


120 


1.289 


1.658 


1.980 


2.358 


2.617 


GO 


1.282 


1.645 


1.960 


2.326 


2.576 
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Table VI 



C 2 Critical Points* (C 2 - rVd.f.) 




Pr 

AC 

at \ 




95 


.99 


.975 


.95 


.90 


.10 


.05 


.025 


.01 


.005 


1 


,0( 


0039 


.0001 6 


.00098 


.0039 


.0158 


2.71 


3,84 


5.02 


6.63 


7.88 


2 


.0( 


501 


.0101 


.0253 


.0513 


.1054 


2.30 


3,00 


3.69 


4.61 


5.30 


3 


.02J39 


.0383 


.0719 


.117 


.195 


2.08 


2.60 


3.12 


3.78 


4.28 


4 


.0$17 


.0743 


.121 


.178 


.266 


1.94 


2.37 


2.79 


3.32 


3.72 


5 


.05 


!23 


.1 11 


.166 


,229 


.322 


1.85 


2.21 


2.57 


3.02 


3.35 


6 


.11 


3 


.145 


.206 


.273 


.367 


1.77 


2.10 


2.41 


2.80 


3,09 


7 


M 


1 


.1 77 


.241 


.310 


.405 


1.72 


2.01 


2.29 


2.64 


2.90 


8 


At 


8 


.206 


.272 


.342 


.436 


1,67 


1.94 


2.19 


2.51 


2.74 


9 


.15 


3 


.232 


.300 


.369 


.463 


1.63 


1.88 


2.11 


2.41 


2.62 


10 


.21 


6 


.256 


.325 


.394 


.487 


1.60 


1.83 


2.05 


2.32 


2.52 


1 1 


.2: 


1 


.278 


.347 


.416 


.507 


1.57 


1.79 


1.99 


2.25 


2.43 


12 [ 


.2! 


6 


.298 


.367 


.435 


.525 


1.55 


1.75 


1.94 


2.18 


2.36 


13 


.274 


.316 


.385 


.453 


.542 


1.52 


1.72 


1.90 


2.13 


2.29 


14 


.291 


.333 


.402 


.469 


.556 


1,50 


1.69 


1.87 


2.08 


2.24 


15 


.3( 


)7 


.349 


.417 


,484 


.570 


1.49 


1.67 


1.83 


2.04 


2.19 


16 


.321 


.363 


.432 


,498 


.582 


1.47 


1.64 


1.80 


2.00 


2.14 


18 


.34 


18 


,390 


.457 


,522 


.604 


1.44 


1.60 


1.75 


1.93 


2.06 


20 


.3 J 


'2 


.413 


.480 


.543 


.622 


1.42 


1.57 


1.71 


1,88 


2.00 


24 


.41 


2 


.452 


.517 


.577 


.652 


1,38 


1.52 


1.64 


1,79 


1.90 


30 


.4$0 


.498 


.560 


.616 


.687 


1.34 


1.46 


1.57 


1.70 


1.79 


40 


:!i 




.554 


.611 


.663 


.726 


1.30 


1.39 


1.48 


1.59 


1.67 


60 




>2 


.625 


.675 


.720 


.774 


1.24 


1.32 


1.39 


1.47 


1.53 


120 




>9 


.724 


,763 


,798 


.839 


1.17 


1.22 


1.27 


1.32 


1.36 


oo j 


L0 


)0 


1.000 


1.000 


1.000 


1.000 


1.00 


1.00 


LOO 


1.00 


1.00 



Interpolation should be performed using reciprocals of the degrees of freedom. 
* To obtain critical values of % z , multiply the critical value of C 2 by (d.f.) 
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Table VII F Distribution Critical Points 5% (Roman Type) and I % (Boldface Type) Points 




Degrees of 
freedom for 
denomina- 
tor 



Degrees of fro 



40 



75 100 200 500 



1 

2 
3 
4 
5 
6 
7 
8 
9 



161 
4052 



18.51 
98.49 



10.13 
34.12 



7.71 
21.20 



6.61 
16.26 



13.74 



5.59 
12.25 



5.32 
11.26 



5.12 
10.56 



200 
4999 



19.00 
99.01 



9.55 
30.81 



6.94 
18.00 



5.79 
13.27 



_5. 14 
10.92 



4.74 
9.55 



4.46 
8.65 



4.26 
8.02 



216 
5403 



19.16 
99.17 



.28 



225 
5625 



19.25 
99.25 



29.46 28.71 



6.59 



6.39 



16.69 15.98 



5.41 
12.06 



4.76 
9.78 



4.35 
8.45 



4,07 
7.59 



3. 86 
6.99 



5.19 
11.39 



4.53 
9.15 



4.12 
7.85 



3.84 
7.01 



3.63 
6.42 



230 
5764 



19.30 
99.30 



9.01 
28.24 



6.26 
15.52 



5.05 
10.97 



4.39 
8.75 



3.97 
7.46 



3.69 
6.63 



3.48 
6.06 



234 
5859 



237 
5928 



239 
G981 



241 
6022 



242 
6056 



243 
6082 



244 
61G6 



245 
6142 



19.33 19.36 

99.33 99.34|99.36 99.38 99.40 



19.38 



19.39 



8.94 
27.91 



6.16 



15.21 14.98 



4. 

10.67 



4.28 
8.47 



3.87 
7.19 



3.58 
6.37 



3.37 
5.80 



19.40 19.41 19.4 

99.41 99.42 99.43 



8 . 83 
27.67 



G.09 



8.84 8.81 
27.49 27.34 



4. 

10.45 



4.21 
8.26 



3.79 
7.00 



3.50 
6.19 



3.29 
5.62 



6.04 6.00 
14.80 14.66 14.54 



.78 
27.23 



5.96 



8.74 
27.13 27.05 



4. 
10.27 



4.15 
8.10 



3.73 
6.84 



3.44 
6.03 



3.23 
5.47 



4.78 
10.15 



4.10 
7.98 



3. 68 
6.71 



3.39 
5.91 



3.18 
5.35 



4.74 
10.05 



4.0t" 
7.87 



3.63 
6.62 



3.34 
5.82 



3.13 
5.26 



5.93 
14.45 



4.70 
9.96 



4.03 
7.79 



3.60 
6.54 



3.31 
5.74 



3.10 
5.18 



5.91 
14.37 



4.68 
9.89 



4.00 
7,72 



3.57 
6.47 



3.28 
5.67 



3.07 
6.11 



8.71 
26.92 



5.87 
14.24 



4.64 
9.77 



3.96 
7.60 



3.52 
6.35 



3.23 
5.56 



3.02 
5.00 



246 
6169 



19.43 
99.44 



8.69 



248 
6203 



19.44 
99.45 



8.66 



249 
6234 



19.45 
99.46 



250 
6258 



19.46 
99.47 



8.64 8.62 



251 
6286 



19.47 
99.48 



8.60 



26.83 26.69 26.60 26.50 26.41 



25: 

6302 



19-47 
99.48 



8-58 



253 
6323 



19.48 
99.49 



8.57 



253 
6334 



19.49 
99.49 



254 
6352 



254 
6361 



254 
6366 



19.50 



99.49 99,50 99.50 



26.30 26.27 



8.56 8.54 
26.23 26.18 



5.84 
14.15 



4.60 
9.68 



3.92 
7.52 



3.49 
6.27 



3.20 
5.48 



2.98 
4.92 



5. 
14.02 



4.56 
9.55 



3.87 
7.39 



3.44 
6.15 



3. 15 
5.36 



2.93 
4.80 



5.77 5.74 
13.93 13.83 



4.53 
9.47 



3.84 
7.31 



3.41 
6.07 



3. 12 
5.23 



2.90 
4.73 



4.50 
9.38 



3.81 
7.23 



3.38 
5.98 



3.08 
5.20 



2.86 
4.64 



5.71 
13.74 



4.46 
9.29 



3.77 
7.14 



3.34 
5.90 



3.05 
5.11 



2.82 
4.66 



5.70 
13.69 



4.44 
9.24 



3.75 
7.09 



3.32 
5.85 



3.03 
5.06 



2.80 
4.61 



5.68 5.66 5.65 
13.61 13.67 13.52 



4.42 
9.17 



3.72 
7.02 



3.29 
5.78 



3.00 
5.00 



2.77 
4.45 



4.40 
9.13 



3.71 
6.99 



3.28 
6.75 



4. 

9.07 



3.69 
6.94 



3.25 
5.70 



2.96 
4.96] 4.91 



2.76 
4.41 



2.73 
4.36 



8.54 
26.14 



5.64 
13.48 



4.37 
9.04 



3.68 
6.90 



3.24 
5.67 



2.94 
4.88 



2.72 
4.33 



8.53 
26.12 



5.63 
13.46 



4.36 
9.02 



3.67 
6.83 



3.23 
6.65 



2.93 
4.66 



2.71 
4.31 



10 



11 

12 
13 
U 
15 
16 
17 

2 18 

19 

20 

21 

22 

23 
24 

25 



4.9G 
10,04 


4.10 
7.66 


3.71 
6 65 


3.48 
. 5.99 


3.33 
_6_64 


3.22 
5.39 


3. 14 
-JL21 


3.07 
6.06 


3.02 
4.95 


2.97 
4.85 


2 . 94 
4.78 


2 .91 
4.71 


2.86 
4.60 


2 . 82 
4.52 


2. 77 
4.41 


0 74 
4.33 


2. i0 
4.25 


2 67 
4.17 


2 64 
1.12 


2 61 
4.05 


2 59 
4.01 


2. 56 
3.96 


2.55 
3.93 


2 54 
3.91 


4.84 


3.9S 


3.59 


3.36 


3.20 


3.09 


3.01 


2.95 


2 . 90 


2.86 


2.82 


2 . 79 


2 . 74 


2. 70 


2 . 65 


2 61 


2 57 


2 53 


2 50 


2 47 


2 45 


2 42 


2.41 


2.40 


_a~6& 




_€L22. 


-5^61 


Ju5% 




JL88 


JL74 


JL63 


4.54 


4.46 

■** "* 1 


4.40 




4.29 


4.21 

- ** ' 


4.10 

1 


4^02 


3^94 


3.86 


3.80 


3.74 


3^70 


3^66 


3.62 


3.60 


4.75 


3.89 


3.49 


3.26 


3.11 


3.00 


2.92 


2.85 


2. 80 


2.76 


2 . 72 


2. 69 


2 .64 


2. 60 




2 50 


2 46 


2 42 


2 40 


2 36 


2 35 


2 32 


2 31 


2. 30 


9,33 


6.93 


5.95 


5.41 


5.06 


4.82 


4.65 


4.50 


4.39 


4.30 


4.22 


4.16 


4.05 


3.98 


3] 86 


3^78 


3^70 


3^61 


3! 56 


3^49 


3A6 


3^41 


3i38 


3^36 


4:67 


3.80 


3.41 


3. 18 


3.02 


2.92 


2.84 


2. 77 


2 . 72 


2. 67 


2 . 63 


2. 60 


2 . 55 


2.51 


2 46 


2 42 


2 38 


2 34 


2 32 


2 28 


2 26 


2.24 


2. 22 


2 21 


9.07 


6.70 


5.74 


5.20 


4.86 


4.62 


4.44 


4.30 


4.19 


4.10 


4.02 


3.96 


3.85 


3.78 


3^67 


3! 59 


3. 51 


3.42 


3^37 


3^30 


3 1 27 


3.21 


3ll8 


3!l6 


4.60 


3.74 


3.34 


3.11 


2.96 


2.85 


2.77 


2.70 


2.65 


2.60 


2 . 56 


2. 53 


2.48 




2 39 


2 35 


2 31 


2 27 


2 24 


2 21 


2 19 


2.16 


2. 14 


2. 13 


8.86 


6.51 


5.66 


6.03 


4.69 


4.46 


4.28 


4.14 


4.03 


3.94 


3.86 


3.80 


3.70 


3^62 


3.51 


3^43 


3^34 


3^26 


3.21 




3 111 


3.06 


3^02 


3 loo 


4.54 


3.68 


3.29 


3.06 


2.^0 


2.79 


2.70 


2.64 


2.59 


2.55 


2.51 


2.48 


2.43 


2.39 


2 .33 


2.29 


2.25 


2.21 


2. 18 


2.15 


2. 12 


2.10 


2.08 


2 . 07 


8.68 


6.36 


5.42 


4.89 


4.66 


4.32 


4 .14 


4 .00 


3 .89 


3 .80 


3.73 


3.67 


3.56 


3 48 


3.36 


3 .29 


3.20 


3.12 


3.07 


3.00 


2.97 


2.92 


2.89 


2.87 


4.49 


3.63 


3.24 


3.01 


2.85 


2.74 


2.66 


2.59 


2.54 


2.49 


2.45 


2.42 


2.37 


2.33 


2.28 


2 24 


2.20 


2.16 


2. 13 


2.09 


2.07 


2.04 


2.02 


2.01 


8.53 


6 .23 


5.29 


4.77 


4 . 44 


4.20 


4 03 




3 78 


3 69 


3 61 


3.55 


3.45 


3.37 


3.25 


3.18 


3.10 


3.01 


2.96 


2.89 


2 .86 


2.80 


2,77 


2.75 


4 . 45 


3.59 


3 . 20 


2. 96 


2.81 


2 . 70 




2 . 55 


2 50 


2 45 


2 41 


2.38 


2 33 


2 29 


2 23 


2.19 


2.15 


2.11 


2.0S 


2.04 


2 02 


1 .99 


1 .97 


1 96 


8.40 


6.11 


5.18 


4.67 


4.34 


4.10 


3^93 


3.79 


3! 68 


3l59 


3.52 


3^45 


3! 35 


3^27 


3ll6 


3.08 


3 loo 


2.92 


2.86 


2.79 


2.76 


2.70 


2.67 


2^65 


4.41 


3.55 


3. 16 


2.93 


2.77 


2.66 


2.58 


2.51 


2.46 


2.41 


2.37 


2.34 


2.29 


2.25 


2.19 


2.1,5 


2.11 


2.07 


2.04 


2.00 


1 . 98 


1.95 


1 .93 


1.92 


8.28 


6.01 


5.09 


4.58 


4.25 


4.01 


3.86 


3.71 


3.60 


3.51 


3.44 


3.37 


3.27 


3.19 


3.07 


3.00 


2.91 


2. S3 


2.78 


2.71 


2.68 


2.62 


2.59 


2.57 


4. 38 


3.52 


3.13 


2.90 


2.74 


2.63 


2.55 


2.48 


2.43 


2.38 


2 . 34 


2 3 1 


2.26 


2.21 


2.15 


2.11 


2.07 


2.02 


2.00 


1.96 


1.94 


1.91 


1.90 


1.88 


8.18 


5. S3 


5.01 


4.60 


4.17 


3.94 


3.77 


3.63 


3.62 


3.43 


3.36 


3^30 


3.19 


3.12 


3.00 


2.92 


2.84 


2.76 


2.70 


2.63 


2.60 


2.54 


2.51 


2.49 


4.35 


3.49 


3.10 


2.87 


2.71 


2.60 


2.52 


2.45 


2.40 


2.35 


2.31 


2.2S 


2.23 


2. IS 


2.12 


2.0S 


2.04 


1.99 


1.96 


1.92 


1.90 


1.87 


1.85 


1.84 


8.10 


5.85 


4.94 


4.43 


4.10 


3.87 


3.7?. 


3.56 


3.45 


3.37 


3.30 


3.23 


3.13 


3.05 


2.94 


2.86 


2.77 


2.69 


2.63 


2.56 


2.53 


2.47 


2.44 


2.42 


4.32 


3.47 


3.07 


2.84 


2.68 


2.57 


2.49 


2.42 


2.37 


2 . 32 


2.28 


2.25 


2.20 


2.15 


2.09 


2.05 


2.00 


1 .96 


1.93 


1.89 


1.87 


1 . 84 


1.82 


1-81 


8.02 


5.78 


4.87 


4.37 


4.04 


3.81 


3.65 


3.51 


3.40 


3.31 


3.24 


3.17 


3.07 


2.99 


2.88 


2. SO 


2.72 


2.63 


2.58 


2.51 


2.47 


2.42 


2.38 


2.36 


4.30 


3.44 


3.05 


2.82 


2.66 


2.55 


2.47 


2.40 


2.35 


2.30 


2.26 


2.23 


2.18 


2.13 


2.07 


2.03 


1.98 


1.93 


1.91 


1.87 


1.84 


1.81 


1.80 


1.7S 


7.94 


5.72 


4.82 


4.31 


3.99 


3.76 


3.59 


3.46 


3.35 


3.26 


3.18 


3.12 


3.02 


2.94 


2.83 


2.75 


2.67 


2.53 


2.63 


2.46 


2.42 


2.37 


2.33 


2.31 


4.28 


3.42 


3.03 


2.80 


2.64 


2.53 


2.45 


2.38 


2.32 


2.28 


2.24 


2 20 


2.14 


2.10 


2.04 


2.00 


1.96 


1.91 


1.88 


1.84 


1.82 


1.79 


177 


1.76 


7.88 


5.66 


4 .76 


4.26 


3.94 


3.71 


3.54 


3.41 


3.30 


3.21 


3.14 


3.07 


2.97 


2.89 


2.78 


2.70 


2.62 


2.53 


2.48 


2.41 


2.37 


2.32 


2.28 


2-26 


4.26 


3.40 


3.01 


2.78 


2.62 


2.51 


2.43 


2.36 


2.30 


2.26 


2. -£2 


2.18 


2.13 


2.09 


2.02 


1.98 


1.94 


1.89 


1.86 


1.82 


1.80 


1.76 


1.74 


1.73 


7.82 


5.61 


4.72 


4.22 


3.90 


3.67 


3.50 


3.36 


3.25 


3.17 


3.09 


3.03 


2.93 


2.86 


2.74 


2.66 


2.58 


2.49 


2.44 


2.36 


2.33 


2.27 


2.23 


2.21 


4.24 


-3,38 


-2.99 


2.76 


2.60 


2,4ft 


2.41 


2,34 


-2.28 


2.24 


2.20 


2.16 


2.11 


2.06 


2.00 


1.96 


1.92 


1.87 


1.84 


1.80 


1.77 


1.74 


1.72 


1.71 


7.77 


5.57 


4.68 


4.18 


3.86 


3.63 


3.46 


3.32 


3.21 


3.13 


3.05 


2.99 


2.89 


2.81 


2.70 


2.62 


2.54 


2.46 


2.40 


2.32 


2.29 


2.23 


2.1* 


2.17 



(Contd) 



Table VII (Continued) 



Degrees of 
freedom for 


Degrees of freedom for numerator 


denomina- 
tor 


I 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


14 


16 


20 


24 


30 


40 


50 


75 


100 


200 


500 


CO 


26 


4.22 
7.72 


3.37 
6.63 


2.89 
4.64 


2.74 
4.14 


2.59 
3.82 


2.47 
3.59 


2.39 
3.42 


2.32 
3.29 


2.27 
3.17 


2.22 
3.09 


2.18 
3.02 


2.15 
2.96 


2.10 
2.86 


2.05 
2.77 


1.99 
2.66 


1.95 
2.58 


1.90 
2.50 


1.85 
2.41 


1.82 
2.36 


1.78 
2.28 


1.76 
2.25 


1.72 
2.19 


1.70 
2.15 


1.69 
2.13 


27 


4.21 
7.68 


3.35 
5.49 


2.96 
4.60 


2.73 
4.11 


2.57 
3.79 


2.46 
3.56 


2.37 
3.39 


2.30 
3.26 


2.25 
3.14 


2.20 
3.06 


2.16 
2.98 


2.13 
2.93 


2.08 
2.83 


2.03 
2.74 


1.97 
2.63 


1.93 
2.55 


1.88 
2.47 


1.84 
2,38 


1.80 
2,33 


1.76 
2.25 


1.74 
2.21 


1.71 
2.16 


1.68 
2.12 


1.67 
2.10 


28 


4.20 
7.64 


3.34 
6.45 


2.95 
4.57 


2.71 
4.07 


2.56 
3.76 


2.44 
3.53 


2.36 
3.35 


2.29 
3.23 


3.24 
3.11 


2.19 
3.03 


2.15 
2.95 


2.12 
2.90 


2.06 
2.80 


2.02 
2.71 


1.96 
2.60 


1.91 
2,52 


1.87 
2.44 


1.81 
2.35 


1.78 
2.30 


1.75 
2.22 


1.72 
2.18 


1.69 
2.13 


1.67 
2.09 


1.65 
2.06 


29 


4.18 
7.60 


3.33 
6.52 


2.93 
4.64 


2.70 
4.04 


2,54 
3.73 


2 43 
3^50 


2.35 
3.33 


2.28 
3.20 


2 22 

iioi 


2.18 
3.00 


2.14 
2.92 


2.10 
2.87 


2.05 
2.77 


2.00 
2.68 


1.94 
2.57 


1.90 
2.49 


1.85 
2.41 


1.80 
2.32 


1.77 
2.27 


1.73 
2.19 


1.71 
2.15 


1.68 
2.10 


1.65 
2.06 


1.64 
2.03 


30 


4.17 
7.66 


3.32 
6.39 


2.92 
4.61 


2.69 
4.02 


2.53 
3.70 


2.42 
3.47 


2.34 
3.30 


2.27 
3^17 


2.21 
3.06 


2.16 
2.98 


2.12 
2.90 


2.09 
2.84 


2.04 
2.74 


1.99 
2.66 


1.93 
2.55 


1.89 
2.47 


1.84 
2.33 


1.79 
2.29 


1.76 
2.24 


1.72 
2.16 


1.69 
2.13 


1.66 
2.07 


1.64 
2.03 


1.62 
2.01 


32 


4.15 
7.60 


3.30 
5.34 


2.90 
4.46 


2.67 
3.97 


2.51 
3.66 


2.40 
3.42 


2.32 
3.25 


2.25 
3.12 


2.19 
3.01 


2.14 
2.94 


2.10 
2.86 


2.07 
2.80 


2.02 
2.70 


1.97 
2.62 


1.91 
2.51 


1.86 
2.42 


1.82 
2.34 


1.76 
2.25 


1.74 
2.20 


1.69 
2.12 


1.67 
2.08 


1.64 
2.02 


1.61 
1.98 


1.59 
1.96 


34 


4.13 
7.44 


3.28 
5.29 


2.88 
4.42 


2.65 
3.93 


2.49 
3.61 


2.38 
3.38 


2.30 
3.21 


2.23 
3.08 


2.17 
2.97 


2.12 
2.89 


2.08 
2.82 


2.05 
2.76 


2.00 
2.66 


1.95 
2.68 


1.89 
2.47 


1-84 
2. 38 


1- 80 

2- 30 


1.74 
2.21 


1.71 
2.15 


1.67 
2.08 


1.64 
2.04 


1.61 
1.98 


1.59 
1.94 


1.57 
1.91 


36 


4.11 
7.39 


3.26 
5.25 


2.86 
4.38 


2.63 
3.89 


2.48 
3.58 


2.36 
3.35 


2.28 
3.18 


2.21 
3.04 


2.15 
2.94 


2.10 
2.86 


2.06 
2.78 


2.03 
2.72 


1.89 
2.62 


1.93 
2.54 


1.87 
2.43 


1.S2 
2.35 


1-78 
2.26 


1.72 
2.17 


1.69 
2.12 


1.65 
2.04 


1.62 
2.00 


1.59 
1.94 


1.56 
1.90 


1.55 
1.87 


38 


4.10 
7.35 


3.25 
5.21 


2.85 
4.34 


2.62 
3.86 


2.46 
3.6*4 


2.35 
3.32 


2.26 
3.15 


2.19 
3.02 


2.14 
2.91 


2.09 
2.82 


2.05 
2.75 


2.02 
2.69 


1.D6 
2.59 


1.92 
2.51 


1.85 
2-40 


1.80 
2.32 


1.76 
2.22 


1.71 
2.14 


1.67 
2.08 


1.63 
2.00 


1.60 
1.97 


1.57 
1.90 


1.54 
1.86 


1.53 
1.84 


40 


4.08 
7.31 


3.23 
5.18 


2r84 

4.31 


2.61 
3.83 


2.45 
3.51 


2.34 
3.29 


2.25 
3.12 


2.18 
2.99 


2.12 
2.88 


2.07 
2.80 


2.04 
2.73 


2.00 
2.66 


1.95 
2.56 


1.90 
2.49 


1.84 
2.37 


1.79 
2.29 


1.74 
2.20 


.1.69 
2.11 


1.66 
2.05 


1.61 
1.97 


1.59 
1.94 


1.55 
1.88 


1.53 
1.84 


1.51 
1.81 


42 


4.07 
7.27 


3.22 
6.15 


2.83 
4.29 


2.59 
3.80 


2.44 
3.49 


2.32 
3.26 


2.24 
3.10 


2.17 
2.96 


2.11 
2.86 


2.06 
2,77 


2.02 
2.70 


1.99 
2.64 


1.94 
2.54 


1.89 
2.46 


1.82 
2.35 


1.78 
2.26 


1.73 
2.17 


1-68 
2.08 


1.64 
2.02 


1.60 
1.94 


1.57 
1.91 


1.54 
1.85 


1.51 
1.80 


1.49 
1.78 


44 


4.06 
7,24 


3.21 
5.12 


2.82 
4.26 


2.58 
3.78 


2.43 
3.46 


2.31 
3.24 


2.23 
3.07 


2.16 
2.94 


2.10 
2.84 


2.05 
2.75 


2.01 
2.68 


1.98 
2.62 


1.92 
2.52 


1.S8 
2.44 


1.81 
2.32 


1.76 
2.24 


1.72 
2.15 


1.66 
2.06 


1.63 
2.00 


1.58 
1.92 


1.56 
1.88 


1.52 
1.82 


1.50 
1.78 


1.48 
1.75 


46 


4.05 
7.21 


3.20 
5.10 


2.81 
4.24 


2.57 
3.76 


2.42 
3.44 


2.30 
3.22 


2.22 
3.05 


2.14 
2.92 


2.09 
2.82 


2.04 
2.73 


2.00 
2.66 


1.97 
2.60 


1.91 
2.50 


1.87 
2.42 


1.80 
2.30 


1.75 
2.22 


1.71 
2.13 


1.65 
2.04 


1.62 
1.98 


1.57 
1.90 


1.54 
1.86 


1.51 
1.80 


1.48 
1.76 


1.46 
1.72 


48 


4.04 
7.19 


3.19 
5.08 


2.80 
4.22 


2.56 
3.74 


2.41 
3.42 


2.30 
3.20 


2.21 
3.04 


2.14 
2.90 


2,08 
2.80 


2.03 
2.71 


1.99 
2.64 


1.96 
2.68 


1.90 
2. 48 


1.86 
2.40 


1.79 
2.28 


1.74 
2.20 


1.70 
2.11 


1.64 
2.02 


1.61 
1.96 


1.56 
1.83 


1.53 
1.84 


1.50 
1.78 


1,47 
1.73 


1.45 
1.70 



55 
60 
65 
70 
80 
100 

u> 

125 
150 
200 
400 
1000 



4.03 
7.17 



4.02 
7.12 



4.00 
7.08 



3.99 
7.04 



3.98 
7.01 



3.9G 
6.95 



3.94 
6.90 



3.92 
6.84 



3.91 
6.81 



3.89 
6.76 



3.8G 
6.70 



3.85 
6.66 



3 84 
6.64 



TTT5 
6.06 

3.17 
5.01 

3.35 
4.98 

3.14 
4.95 

3.13 
4.92 

3.11 
4.88 

3.09 
4.82 

3.07 
4.78 

3.00 
4.75 

3.04 
4.71 

3.02 
4.66 

3.00 
4.62 

2.99 
4.60 



2771? ~236 



4.20 



2.78 
4.16 



2.76' 
4.13 



2.75 
4.10 



2.74 
4.08 



2.70 
3.98 



2.68 
3.94 



2.67 
3.91 



2.65 
3.88 



2.62 
3.83 



2.61 
3.80 



2.60 
3.78 



3.72 



2.54 
3.68 



2.52 
3.65 



2.51 
3.62 



2.50 
3.60 



2. 48 
3.56 



2.46 
3.51 



2.44 
3.47 



2.43 
3.44 



2.41 
3.41 



2.39 
3.36 



2.37 
3.32 



"2 '."'40 
3.41 



2.3S 
3.37 



3.34 

2.36 
3.31 



2.35 
3.29 



2.33 
3.25 



2.30 
3.20 



2.29 
3.17 



2.27 
3.13 



2.26 
3.11 



2.23 
3.06 



2.38 2.22 
3.34 



2.21 
3.02 



2.27 
3.15 



2.25 
3.12 



2 24 
3^09 



2 32 
3^07 



2.21 
3.04 



2.19 
2.99 



2.17 
2.95 



2.16 
2.92 



2.14 
2.90 



2 12 
5L85 



2.10 
2.82 



2.09 
2.80 



2.18 
2.98 



2.17 
2.95 



2.15 
2.93 



2.14 
2.91 



2.12 
2.87 



2.10 
2.82 



2.08 
2.79 



2.07 
2.76 



2.05 
2.73 



2.03 
2.69 



2.02 
2.66 



2.01 
2.64 



2.11 
2.85 



2.10 
2.82 



2.08 
2.79 



2.07 
2.77 



2.05 
2.74 



2.03 
2.69 



2.01 
2.65 



2.00 
2.62 



1.98 
2.60 



1.96 
2.65 



1.95 
2.53 



1.94 
2.51 



2.05 
2.75 



2.04 
2.72 



2.02 
2.70 



2.01 
2-67 



1.99 
2.64 



1.97 
2.59 



1. 
2.56 



1.94 
2.53 



1.92 
2.50 



1.90 
2.46 



1.89 
2.43 



1.88 
2.41 



2.00 
2.66 



1.99 
2.63 



1.98 
2.61 



1.97 
2.59 



1.95 
2.55 



1.92 
2.51 



1.90 
2.47 



1.89 
2.44 



1.87 
2.41 



1.85 
2.37 



1.84 
2.34 



1.83 
2.32 
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2.62 



1.9' 
2.69 



1.95 
2.56 



1.94 
2.54 



1.93 
2.51 



1.91 
2.48 



1.88 
2.43 



1.86 
2.40 



1.85 
2.37 



1.83 
2.34 



1.81 
2.29 



1. 

2.26 



1.79 
2.24 



2.56 



1.93 
2.53 



1.92 
2.50 



1.90 
2.47 



1.89 
2.45 



1.88 
2.41 



1.85 
2.36 



1.83 
2.33 



1.82 
2.30 



1.80 
2.28 



1.78 
2.23 



1.76 
2.20 



1.75 
2.18 



2.46 



1.88 
2.43 



1 . 86 
2.40 



1. 
2.37 



1.84 
2.35 



1.82 



1.79 
2. 25 



1.77 
2.23 



1.76 
2.20 



1.74 
1.17 



1.72 
2.12 



1.70 
2.09 



1.69 
2.07 



2.39 



1.83 
2. 35 



1.81 
2.32 



1.80 
2.30 



79 



1.77 
2.24 



1,75 
2. IS 



1.72 
2.15 



1.71 
2.12 



1.69 
2.09 



1.67 
2.04 



1.65 
2.01 



1.64 
1.99 



2.26 



1 .76 
2.23 



1.75 
2.20 



1.7: 
2.18 



1.72 
2.15 



1.70 
2.11 



1. 
2.06 



1.65 
2.03 



1.64 
2.00 



1.62 
1.97 



1.58 
1.89 



1.57 
1.87 



2.18 



1.72 
2.15 



1.70 
2.12 



1.68 
2.09 



1.67 
2.07 



1.65 
2.03 



1.63 
1.98 



1.60 
1.94 



1.59 
1.91 



1.57 
1.88 



1.60 1.54 
1.84 



1.53 
1.81 



1.52 
1.79 



2.10 



1.67 
2.06 



1.65 
2.03 



1.63 
2.00 



1.62 
1.98 



1.60 
1.94 



1. 

1.89 



1.55 
1.85 



1.54 
1.83 



1.52 
1.79 



1.49 
1.74 



1.47 
1.71 



1.46 
1.69 



2.00 



1.61 
1.96 



1.59 
1.93 



1.57 
1.90 



1.56 
1.88 



1.54 
1.84 



1.51 
1.79 



1.49 
1,75 



1.47 
1,72 



1.45 
1.69 



1.42 
1.64 



1.41 
1.61 



1.40 
1.59 
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1.58 
1.90 



1.56 
1.87 



1.54 
1.84 



1.53 
1.82 



1.51 
1.78 



1.48 
1.73 



1.4, 
1.68 



1.44 
1.66 



1.42 
1.62 



1.38 
1.57 



1.36 
1.54 



1.35 
1.52 



1.55 



1.52 
1.82 



1.50 
1.79 



1.49 
1.76 



1.47 
1.74 



1.45 
1.70 



1.42 
1.64 



1.39 
1.69 



1.37 
1.56 



1.35 
1.53 



1.32 
1.47 



1.30 
1.44 



1.28 
1.41 



1.52 



1,50 
1.78 



1.48 
1.74 



1.46 
1.71 



1.45 
1.69 



1.42 
1.65 



1.39 
1.59 



1.36 
1.54 



1.34 
1.51 



1.32 
1.48 



1.28 
1.42 



1.26 
1.36 



1.24 
1.36 



1.48 



1.46 
1.71 



1.44 
1.68 



1.42 
1.64 



1.40 
1.62 



1.38 
1.57 



1.34 
1.51 



1.31 
1.46 



1.29 
1.43 



1.26 
1,39 



1.22 
1.32 



1.19 
1.28 



1.1 
1.25 



Table VIII Common Logarithms* 



l V 


n 

\J 


1 


2 


3 


4 


5 


6 


7 


$ 


9 


in 


0000 


0041 


0086 


01 98 


0170 


0212 


0253 


0294 


0334 


0374 


1 1 

IX 


04.14 


04 SI 

vtJ J 


040 9 


0S1 1 

uj j i 


0S60 

VJU7 


0607 


0645 


0682 


0710 

\j j 1 y 


O755 


17 


0709 
U / yz. 


087 8 
uozo 


0864 


0800 
vjoyy 


0014 


0060 


1004 

1 UUH- 


1038 


1077 


1 106 


1 1 


I 1 10 


1 1 71 


17f)6 


1 710 


1271 


1 101 

1 JUJ 


1335 


1367 


1399 


1430 


id 


1461 

J H-U 1 , 


1409 

1 t-'Z 


1 S91 

1 JZ, J 


1 SSI 

1 J J J 


1 S84 
i J ot 


1614 


1 644 


1 671 

1 KJ f J 


1 701 

1 / U J 


1732 


lj 


1 761 
1 /DI 


1 700 


1818 

1 O J o 


1 847 

1 04 / 


1 87S 


1 001 

1 7UJ 


1011 

i y j i 


1 oso 

1 y j 7 


1087 
1 yo 1 


2014 


16 


7041 


9068 


900 S 


9 1 99 


9148 


917S 

Z I / J 


9701 


2227 


99S1 


2279 


17 


9104 
z, jut 


9110 


71SS 

lJJJ 


7180 


240 S 


9410 


2455 


2480 


2504 


2529 


IK 


j^j j j 


9S77 

Z J / / 


9601 


969S 


9648 


9679 

ZrU / Zf 


2695 


2718 


2742 


2765 


10 


7788 


2810 


2833 


2856 


2878 


2900 


2923 


2945 


2967 


2989 




101 0 


1019 


10 S4 


107S 

Jv / J 


3096 


3118 


3139 


3160 


3181 


3201 


71 


1779 


1941 


1961 


1784 


3304 


3324 


3345 


3365 


118S 

J JO J 


3404 


/Lit 


1474 


1444 


1464 


1481 

JtOJ 


3502 


3522 


3541 


3560 


3579 


3598 


21 


161 7 


1616 


16SS 


1674 
JU / *+ 


3692 


371 1 


3729 


3747 


3766 


3784 


z/# 




IS 70 


1818 

JOJO 


18S6 


1874 

JO /" 


1809 

J07^ 


3909 


1097 


3945 


3962 


2S 

z.j 


1070 


1007 


4014 
wit 


4011 


4048 


4065 


4082 


4099 


41 16 


4133 


26 


41 SO 


41 66 

L fl l)U 


41 81 

*+I OJ 


4700 


4216 


4912 


4740 


4265 


4281 


4298 


Z 1 


411 4 


4110 


4146 


4169 


4178 

*+ J / o 


4101 


4409 


442 S 


4440 


4456 


2« 
Zo 


4.479 
44 I Z 


4487 
440 / 


4S09 


4S1 8 

H-J I O 


4S11 


4S48 


4S64 

H- JUt 


4S70 


4S04 

*+J /T 


4609 


7Q 


40Z4 


46 1Q 

40 jy 


46 S4 
40 j4 


4660 


4681 

tOOJ 


4608 


471 1 
f / 1 j 


4798 
t / zo 


4742 


4757 


in 


4771 
4 / / 1 


4786 
4 / oO 


4800 


4814 


4890 


4841 
404J 


48 S7 

tOJ / 


4871 
*+o / 1 


4886 

HOOU 


4900 


11 


4Q14 

£f>14 


4Q78 


4049 


40SS 


4060 


4081 


4007 
H-yy 1 


501 1 


5024 


5038 




S0S1 


S06S 

JUUJ 


S070 

ju / y 


S009 


5105 


51 19 


5132 


5145 


5159 


5172 


11 


S18S 


S1 08 
j i yo 


S91 1 


S794 


S917 
j^j / 


S7S0 


5263 


5276 


5289 


5302 


14 


S11 S 


S198 


S140 

J J'+U 


S1S1 

J J J J 


5366 


5378 


539I 


5403 


5416 


5428 


IS 


S441 


S4S1 

JH-J J 


S46S 


S478 

Jt / o 


5490 


5502 


5514 


5527 


5539 


5551 


16 


SS61 

J J U J 


SS7S 


SS87 

JJO / 


ssoo 

J J V 


S61 1 


5623 


5535 


5647 


5658 


5670 


17 


S687 


S604 


S70S 

J / uj 


S71 7 

j / 1 / 


S770 

j / ±*y 


S740 
J / *+u 


5759 


5763 


5775 


5786 


IS 

.JO 


S7Q8 


S800 


S871 


S819 


S841 

J OtJ 


S8SS 

JO J J 


5866 


5877 


5888 


5899 


10 


S01 1 

jy i 1 


S092 


SOU 


S044 


5955 


5966 


5977 


5988 


5999 


6010 


JO 


6021 


6011 


6049 


6053 


6064 


6075 


6085 


6096 


6107 


61 17 


41 


61 78 

KJ 1 ZO 


61 18 

Dl JO 


6140 


61 60 


61 70 


6180 


6191 


6201 


6212 


6222 


42 


6717 


6741 

UZ4J 


69S1 

UZ> J J 


6961 


6774 


6784 


6294 


6304 


6314 


6325 


41 


611S 
0j J J 


614S 
0j4j 


61SS 


616S 

OjOJ 


617S 
O J / J 


618S 

OJOJ 


610S 

OJ7J 


640 S 


641 s 

Utl J 


6425 




641 S 
04j J 


6A44 
0444 


64S4 


6464 


6474 
04 / 4 


6484 


6401 


6S01 


6S1 1 

DJ 1 J 


6522 


4j 


6^19 

0J jZ 


6^49 
Oj4Z 


6SS1 


6S61 
OjO 1 


6S7 1 

O j / 1 


6S80 


6S00 
yjjyyj 


6S00 


6600 


6618 


'IO 


OOZo 


££17 


6646 
0040 


AAS6 
OOjO 


OO0 J 


667S 
OO/ J 


6684. 

DDO+ 


6601 

U07J 


6709 

U / Ux. 


6712 


4/ 


£"79 1 
0 /zl 


£7 in 

0 / jU 


6710 


674Q 


67S8 
0 / jo 


6767 
O /O / 


6776 
0 / / 0 


678 S 

0 / 0 j 


6704 

O / -74 


6801 

UOU J 


4S 
4o 


Oo 1 Z 


OoZ 1 


Do ju 


A810 

UOJ7 


£848 


68S7 

UOJ / 


6866 


6875 


6884 


6893 


4Q 


£00 ~> 


60 J 1 




6078 


6Q17 

U"J / 


6046 


60SS 
\jy J J 


6064 


6079 
yjy 1 ju 


6981 


50 


6990 


6998 


7007 


7016 


7024 


7033 


7042 


7050 


7059 


7067 


51 


7076 


7084 


7093 


7101 


7110 


7118 


7126 


7135 


7143 


7152 


52 


7160 


7168 


7177 


7185 


7193 


7202 


7210 


7218 


7226 


7235 


53 


7243 


7251 


7259 


7267 


7275 


7284 


7292 


7300 


7308 


7316 


54 


7324 


7332 


7340 


7348 


7356 


7364 


7372 


7380 


7388 


7396 



374 



Table VIII (Continued) 



N 

55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 



7404: 
7482, 
7559 
|7634j 
7709 
b782 

& 

17993] 
k062 

{261 

r 25 > 

8388 

5451 

8513 j 

^573 

8633 ! 

8692 

^751 



S865 
8921 
8976 
9031 
9085 



7412 
7490 
7566 
7642 
7716 
7789 
7860 
7931 
8000 
8069 
8136 
8202 
8267 
8331 
8395 
8457 
8519 
8579 
8639 
8698 
8756 
8814 
8871 
8927 
8982 
9036 
9090 
9143 
9196 
9248 
9299 
9350 
9400 
9450 
9499 
9547 
9595 
9643 
9689 
9736 
9782 
9827 
'9872 
9917 
19961 



7419 
7497 
7574 
7649 
7723 
7796 
7868 
7938 
8007 
8075 
8142 
8209 
8274 
8338 
8401 
8463 
8525 
8585 
8645 
8704 
8762 
8820 
8876 
8932 
8987 
9042 
9096 
9149 
9201 
9253 
9304 
9355 
9405 
9455 
9504 
9552 
9600 
9647 
9694 
9741 
9786 
9832 
9877 
9921 
9965 



7427 
7505 
7582 
7657 
7731 
7803 
7875 
7945 
8014 
8082 
8149 
8215 
8280 
8344 
8407 
8470 
8531 
8591 
8651 
8710 
8768 
8825 
8882 
8938 
8993 
9047 
9101 
9154 
9206 
9258 
9309 
9360 
9410 
9460 
9509 
9557 
9605 
9652 
9699 
9745 
9791 
9836 
9881 
9926 
9969 



7435 
7513 
7589 
7664 
7738 
7810 
7882 
7952 
8021 
8089 
8156 
8222 
8287 
8351 
8414 
8476 
8537 
8597 
8657 
8716 
8774 
8831 
8887 
8943 
8998 
9053 
9106 
9159 
9212 
9263 
9315 
9365 
9415 
9465 
9513 
9562 
9609 
9657 
9703 
9750 
9795 
9841 
9886 
9930 
9974 



7443 
7520 
7597 
7672 
7745 
7818 
7889 
7959 
8028 
8096 
8162 
8228 
8293 
8357 
8420 
8482 
8543 
8603 
8663 
8722 
8779 
8837 
8893 
8949 
9004 
9058 
9112 
9165 
9217 
9269 
9320 
9370 
9420 
9469 
9518 
9566 
9614 
9661 
9708 
9754 
9800 
9845 
9890 
9934 
9978 



7451 
7528 
7604 
7679 
7752 
7825 
7896 
7966 
8035 
8102 
8169 
8235 
8299 
8363 
8426 
8488 
8549 
8609 
8669 
8727 
8785 
8842 
8899 
8954 
9009 
9063 
9117 
9170 
9222 
9274 
9325 
9375 
9425 
9474 
9523 
9571 
9619 
9666 
9713 
9759 
9805 
9850 
9894 
9939 
9983 



7459 
7536 
7612 
7686 
7760 
7832 
7903 
7973 
8041 
8109 
8176 
8241 
8306 
8370 
8432 
8494 
8555 
8615 
8675 
8733 
8791 



8904 
8960 
9015 
9069 
9122 
9175 
9227 
9279 
9330 
9380 
9430 
9479 
9528 
9576 
9624 
9671 
9717 
9763 
9809 
9854 
9899 
9943 
9987 



8 

7466 
7543 
7619 
7694 
7767 
7839 
7910 
7980 
8048 
8116 
8182 
8248 
8312 
8376 
8439 
8500 
8561 
8621 
8681 
8739 
8797 
8854 
8910 
8965 
9020 
9074 
9128 
9180 
9232 
9284 
9335 
9385 
9435 
9484 
9533 
9581 
9628 
9675 
9722 
9768 
9814 
9859 
9903 
9948 
9991 



* The log of N is .'the power to which 10 must be raised to yield N." Thus log 100 = 
2, because 10 2 =j 100. In this table, only the "mantissa" (the digits to the right 
of the decimal) is given for each log. The characteristic (the integer to the left of 



the decimal) 
0, log 10 N 
log 537 - 2 



s 1 ; for example log 19.1 - 1 .281 . Log -/o N requires the characteristic 
the characteristic 2, log WON the characteristic 3, and so on. Thus 



73. 1 
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V. Reproduced, by permission, from R. Fisher and F. Yates, Statistical Tables, 

Oliver and Boyd, Edinburgh, 1938. 
VI. Reproduced, by permission, from W. J. Dixon and F. J. Massey, Introduction 
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VII. Reproduced, by permission, from Statistical Methods, 6th Edition, by 

George W. Snedecor and William G. Cochrane, 1967, by the Iowa State 

University Press, Ames, Iowa. 
VIII. Reproduced from John E. Freund, Modern Elementary Statistics, 3rd 

Edition, © 1967, by permission of Prentice-Hall Tnc, Englewood Cliffs, 

New Jersey. 



Answers to Odd-Numbered Problem: 



The student 
given ansv/ers below 
merely for the benefit of those who 
in error because of slide rule 



is not expected always to calculate the answer as precisely as the 
ow. These answers are given to a fairly high degree of precision 
or those who want it; even so, the last digit may be slightly 
inaccuracy. 
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ANSWERS TO ODD-NUMBERED PROBLEMS 



2-1 Mode < median < mean. The mode 2-15 121.50 
is not a bad central measure in this case, 
which is not very asymmetrical. 



2-17 27.6% (NOT .952/4) 



2-3 77.4,81.25,85 



10 



3 5 



2-5 



U V 60 
] Mean 


70 80 

Median 


90 

Mode 


raw 


77.78 


81.47 


hardly 
defined 


fine 


77.4 


81.25 


85 


coarse 


78.4 


80.00 


80 



(a) Mode depends too much on the 
degree of grouping. 

(b) Usually, but not always, does the 
coarse grouping give worse ap- 
proximations. 

2-7 range = 30 or 40 
MAD = 8.58 



/2736 
V 24 



= V 114.0 = 10.67 



3-1 



n x 23 
(c) — (authors' answer = — = .46) 



2-9 Text coding is preferred because it has 
integral and small values of ,y, which are 
easy to compute. 

2-11 239,483 

2-13 (a) 77.4, 5^484 =11.0 
(b) 8.60, v 7 l44 = 3.8 



n — n. 



= 1 





27 




50 



.54 



3-5 (b) not equally likely; 1/4, 1/4, 1/2 
(c) 3/4 

3-7 (a) .50, .30, .65, .15 
(c) .50, .70, .85, .35 

3-9 (a) A - - 375 

(b) A = -375 

(c) No 



3-11 (a) .40 (e) .17 

(b) -60 H 

(c) .55 io 

(d) .78 



•06 

3-15 (a) — = .29 



= .42( 
= .58/ 



sum = 1 



(b) Yes. Pr (A/ A VJ B) = 

Pr (A) Pr (A) 



Pr (A U 5) Pr (/I) + Pr (B) 
3-17 (a) B ± 2 x , 



_ i 

2 2 1 



(b) 

(c) 



51 — 13 2 6 

x U - .143 



- .0046 
= .00076 



3-19 (a) 
(b) 

(c) 
3-21 0 



ft - -62 



ANSWERS TO ODD-NUMBERED PROBLEMS 

(b) y p(y) 



379 



x i x • • 



- A = .022 



3-25 (a) Yes. ft (E 1 n £ 2 ) 

= Pr (EJ Pr (£ 2 ), 

(b) Vo. Pr (£ x n Eg) 

* Pr (£ x ) Pr (£,), 



3-27 (a) Yes 
(b) Ves 



3-29 (a) 



(b) impossible conditions — there must 
?e an| error of specification. 

(c) ) <|r(^)<-2 

(d) impossible conditions. 



3-31 (a) | 



.001 

3-33 (a) ^— t= .0079 
.126 



r or ti 



(b) 506 

(c) 999 



tosses, 



.001 



.001 -f - (.999) 



low the probabilities grow toward 



See 



certainty as n > cc . 



4-1 


(a) x 






A 
U 


1/ ID 




1 


4/16 




2 


6/16 




3 


4/16 




4 


1/16 






16/16 


4-3 


X 


/><*) 




0 


6/36 




i 
J 


1U/ jD 




2 


8/36 




3 


6/36 




4 


4/36 




5 


2/36 






36/36 


4-5 


(a) /* 


= 2 




(b) ,< 


= 1.5 


4-7 


(a) /^y - 3.5 



2/16 
6/16 
6/16 
2/16 



16/16 



<7 2 = I 

rr 2 - .75 

ii 

a_ Y - Vff = 1.7 
(b), (c) /« y = 11 (T Y - 3.4 



4-9 .r 


p(x) 


0 


16/81 = 


.198 


1 


32/81 - 


.395 


2 


24/81 = 


.296 


3 


8/81 = 


.099 


4 


1/81 - 


.012 



81/81 

/i-f -1.33 
f7 2 = g = .89 



4-11 (a) /< v = 1.36 

(b) // y - 3.15 

(c) /< v 



17 A" 



V2.43 - 1.56 
;r « V2J6 - 1.47 
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4-13 (a) p(x) 



ANSWERS TO ODD-NUMBERED PROBLEMS 



X 


p(x) 


0 


64/125 = 


.512 


1 


48/125 = 


.384 


2 


12/125 - 


.096 


3 


1/125 = 


.008 


fX = 


.60 


1.00 



a 2 = .48 



0.4 



P(x) 



0.2 
0 



1 2 

x 



(b) p(x) 



0- 



F .9 3 ~ 





/>(•») 


0 


.729 


1 


.243 


2 


.027 


3 


.001 


1.00 



ft = .30 
a 2 = .27 



0.8 - 
0.6 - 
0.4 
0.2 
0 



4-15 p(x) 



5 \ 3—a ' 



p(x) 



X 


p(x) 


0 


125/216 = 


.579 


1 


75/216 = 


.347 


2 


15/216 - 


.070 


3 


1/216 - 


.004 




216/216 = 


i f\c\ 
1 .uu 


// = 


i - .50 




a 2 = 


if = 


.416 




0.6 








0.4 
















0.2 














T 


0 1 


2 



4-17 // = /77T 

a 2 = «7r(l — 77) 

4-19 (a) .9544 

(b) .9495 ~ .95 

(c) .9901 =- .99 

(d) .9772 

(e) .9772 ' 

4-21 




0 2 




4-23 (a) .092] 

(b) .251 > sum = 

(c) .J557J 

4-25 (a) x p(x) 



ANSWERS TO ODD-NUMBERED PROBLEMS 
4-31 (a) 



0 



(b) 



2 

R — 



2/16 
6/16 
6/16 
2/16 



-2 
0 



12/16 
4/16 



16/16 
-1.5 

12/16 = .75 
let Y = \X -2 

P(y) 



16/16 



6 



6/16 
8/16 
2/16 



yp(y) 



0 

8/16 
4/16 



16/16 (A = 12/16 



(ii) x-. 


p(x) 


\x~2\ 


\x - 2\p(x) 


\'\ 


2/16 


2 


4/16 




6/16 


1 


6/16 


■ V 


6/16 


0 


0/16 


\ 3 


2/16 


1 


2/16 


i 

1 . 


f x = 12/16 



(c) 



E(X- 



(d) mx 



4-27 (a) 
(b) 
fc) 



4-29 (a) 
(b) 



L }l f = 3/4 « of course 



2 ^ j .6* .4 5 ~* - .683 

A sa nple of 5 has a 68 % chance 
of cc rrectly predicting, whereas a 
single observation has only a 60% 
chance. 



.006;: 
10.124 



e 


Pr (e) 


.sss 


15/48 


.SSF 


3/48 


.SFS 


5/48 




1/48 


.FSS 


15/48 


.FSF 


3/48 


.FFS 


5/48 


.FFF 


1/48 





381 


X 


p(x) 


0 


15/48 


1 


I AO 


2 


9/48 


3 


1/48 



48/48 



48/48 

ft - tt - .92 
(note « J + 1 + 4) 
(b) 10/48 = .21 



5-1 (a) 



(b) 



■\ 


0 


1 


2 


3 


/>(*) 


0 


1/16 








1/16 


1 




2/16 


2/16 




4/16 


2 




2/16 


2/16 


2/16 


6/16 


3 




2/16 


2/1 « 




4/16 


4 


1/16 








1/16 


0 




2 


4 




1 

y 



» • 
» • 

4f- 



or 



4 

T-r 



j 

4 
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ANSWERS TO ODD-NUMBERED PROBLEMS 



(0 2, 1 
(ci) x 

I 

2 
3 



pu/r = 2) 



1/3 
1/3 
1/3 



(e) 2, 2/3 

(f) No. For example, 
p(0, I) ^^ v (0)/? r (i). 

5-3 (a) 



(b) 1.2 

(c) .2 

5-7 -.6 
5-9 (a) Yes 



(b) fi) 0, because of symmetry 
(ii) E[X) + £fX) - 3^ 



• • • 





4 










(b) 


X 


p(x) 




0 


.2 




1 


.6 




2 


.2 


(c) 


1, A 


(d) 


X 






0 


1/6 




1 


4/6 




2 


1/6 



(e) 1, 1/3 

(f) No, because, for example, 
p(0, 3) f*^(0V» r (3) 



u 


p(«) 


up(u) 


-1 


.2 


-.2 


0 


.1 


0 


1 


.4 


.4 


2 


.1 


.2 


3 


.1 


.3 


5 


.1 


.5 



5-11 (a) s 



s 


/>(*) 


2 


1/9 


3 


2/9 


4 


3/9 


5 


2/9 


6 


1/9 



fi 8 = 4 
aj = 4/3 

(b) For A'j and X 2 
p = 2, a 2 = 2/3 

(c) £YX 3 + Ay = ^(Ay + £(Ay 
var {X 1 + A" 2 ) = Var Xj + Var A" 2 . 



5-13 (a) s 



s 


/>(*) 


2 


.1 


3 • 


.2 


4 


.3 


5 


.3 


6 


.1 



Afc-4.10, 4= 1.29 
(b) ^ = 2.00, a\ 



fi 2 = 2.10, cr| 



60 
69 



/< - 1.2 



(c) cov ( A^ , Ay = 0 by symmetry 



5-15 (a) 



5-17 




ANSWERS TO ODD-NUMBERED PROBLEMS 
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y 


pWX = 5) 


5 


.2 


6 


.6 


7 


.2 



(e) No, because (6) and (d) are 
different 

(f) J 



\w 
h \ 


140 


150 


160 


65 


.2 


.1 




70 


.1 


.2 


.1 


75 




.1 


.2 



Negative coVariance means that a high 
first grade X 1 tends to be followed by a low 
second grade X 2 . This may be because a 
student who does well on the first exam 
becomes overconfident and fails to study 
for the secondj exam. Similarly, a student 
who does poqrly on the first exam may 
study very hard for the second. 

The negative covariance makes the 
average trade] less fluctuating (a = 10 
instead of 15). * 



65 
70 
75 



140 



160 



i — r 



(b) 



5-19 (a) 



(c) 



s 



■pU) 



(b) 



.5 
.5 



y 


p(y) 


5 


.2 


6 


.4 


7 


.4 



(c) 



5.5, fiy - 6.2 
Pis) 



.1 
.4 
.2 
.3 



h 


p(h) 


65 


.3 


70 


.4 


75 


.3 


/'// = 


70 


>v 


p{w) 


140 


.3 


150 


.4 


160 


.3 



= 15 



a* y = 60 



/far 



11,7 = p x + M F 



P W = 150 

(d) 20 

(e) 143.3, 156.7 

(f) No, because <^ HW 0 
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(g) ^ = 590 (^2^ + 3^) 
a] - 840 (=4(7^ + 9a 2 r 
+ 12a HTr ) 

rr z = 29.0(==\ / 840) 

coded 



I 


p(i) 


j' = (/ - 600)/10 


!'>(/') 


550 


.2 


~5 


-1.0 


560 


.1 


-4 


- .4 


570 


0 


-3 


0 


580 


.1 


-2 


- .2 


590 


.2 


-1 


- .2 


600 


.1 


0 


0 


610 


0 


1 


0 


620 


.1 


2 


.2 


630 


.2 


3 


.6 



-1.0 



p z = 600 + 10(-1.0) = 590 
Similarly, a\ = 840 



5-23 (a) 



5* 


10^ 


25* 


25* 


\ 


\ 


\ 


\ 


(H 


H 


H 


H) 


(H 


H 


H 


T) 


(H 


H 


T 


H) 


(H 


H 


T 


T) 


(H 


T 


H 


H) 




etc. 







(b) 



r 


p(r) 


0 


1/16 


5 


1/16 


10 


1/16 


15 


1/16 


20 


0 


25 


2/16 


30 




85 


1/16 







0 


1/2 


5 


1/2 



5-25 





= 5/2 


*? 


- 25/4 




(d) ^ 2 


= 10/2 




= 100/4 






= 25/2 




= 625/4 






= 25/2 




= 625/4 




(e) = 65/2 


2 of = 1375/4 




= 32.5 


« 343.75 


(a) 








(b) 




1 


2 


• • 6 




1 

2 


0 
1/30 


1/30 
0 


• ■ 1/30 


1/6 
1/6 


6 


1/30 




0 


1/6 


/>0' 2 ) 


1/6 


1/6 


■•• 1/6 


1 



(x R = 32.5 
(A = 343.75 



(c) No, because, for example, 
/>d, 1) ^p^{\)p Y (\) 
i.e.,0 ^i-i 

(d) -7/12 = -.58 

(e) 3.5, 35/12 = 2.92 

(f) ^ = 7.0 {^fi 1 + a* 2 ) 

a\ = 28/6 (=cr? + a\ + 2rr 12 ) 
=5 4.67 

Or compute from p(s) directly 
(the hard way). 



5-27 (a) 

(b) 



6-1 Correct 
interchaige 
''range. 1 



6-5 (a) 
(b) 



ANSWERS TO ODD-NUMBERED PROBLEMS 

35, $50/12 = 29.2 6-13 (a) 9000 and 900,000 

60 -10 - 50 (b) .147 
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on: In the last sentence, 
"standard deviation" with 



4, ^ 
_ 

X 

—J 


K 8/3 = 1.63 
p(*> 


2 


119 


I 


2/9 




3/9 


5 


2/9 


6 


1/9 



(c 



a x L Vg/6 1.154 
p(x) 



10/ 



1 



M *= 4 

ctjf ±= V8/9 = .943 
(d) See tig. 6-3 (a). 



fl Vv 



(-3.67 < Z < 3.00) = .9987 



6-9 .0154 



1/27 
3/27 
6/27 
7/27 
6/27 
3/27 
1/27 



6-15 (a) .014 

(b) .008 A 

6-17 .24 

6-19 .018 (.023 without continuity correc- 
tion) 

6-21 (a) (.309) 5 - .0028 

(b) .131 

(c) .131 

(d) Since 850 = 170 x 5, (b) and (c) 
are asking exactly the same event. 
On the other hand, event {b) 
occurs whenever (a) occurs, and 
some other times as well. 

6-23 (a) Equally 

(b) 2/i, 3.92\ 7 « 
200, 39.2 



6-25 (a) 
6-27 (a) 



Z\> 



100 



V(200)(8.5) 



lj = .016 



p(x) 



0.6 
0.4 

0.2 
0 



(b) 



62 6! 
a 

65, (J 2 = 18/5 
/?(x) 



68 

3.6 



63.5 

65 

66.5 



.3 
A 

.3 



(c) 



P(x) 



0.4 

0.2 
OL-V 



I I 



lLi 



i_L 



62 



65 



68 



6-11 .023 
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(d) fix = 65 = fx 
(e) 



(c) 



<r 2 /3 
1.35- y 



(f) 



4 



(7 2 /N - n\ 

" TT^ - i ) 



<a 2 



O) 

7-1 (a) 71 ± 1.96 -4= = 71 ± .59 
VlOO 



7-3 .83 ± .032 

7-5 (a) 20(i£) 19 (l 0 ) 

(b) (M) 20 

(c) 1 — (sum of answers above) 
Answers (a) and (b) can be roughly 

approximated by the normal, as .39 and .30 
(or .243, if you like). The correct values are 
.377 and .358, respectively. 

7-7 (b) 9/10. Hence Xis preferable. 

7-9 



X 


p{x) 


xp(x) 


2 


1/9 


2/9 


3 


2/9 


6/9 


4 


3/9 


12/9 


5 


2/9 


10/9 


6 


1/9 


6/9 



X 


(if 


(x) 2 p(x) 


2 


4 


4/9 


3 


9 


18/9 


4 


16 


48/9 


5 


25 


50/9 


6 


36 


36/9 



E(X 2 ) = 156/9 * ^ 
Bias = E(X 2 ) - /a 2 

« 156/9 - 4 2 

= 4/3 

(d) Similarly, 

E(1IX) * Ufji 
Bias = .274 - .250 
= .024 

Theoretically, 

(a) unbiased, by (6-10) 

(b) unbiased, by Table 4-2 

(c) biased; by (4-5), for any random 
variable: 

E{X*) - ft 2 = a 2 

In particular, for X: 

E(X 2 ) - p* = g\ 

i.e., bias = g\ 

= 4/3 



7-11 



(a) E{X) = 36/9 = /i 

(b) £ 2^ + 1 (2x + !)/>(.£) 



2 


5 


5/9 


3 


7 


14/9 


4 


9 


27/9 


5 


11 


22/9 


6 


13 


13/9 




MLE = .75 
= x/n 



E(2X + 1) = 81/9 = lii + J 
unbiased 



7-13 (a) 
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2 & ~ t*) 2 



(b) Yes. Proof: 

4i 



(.72)(.28) (.66)(.34) 
8-19 .06 ± + 

, = .060 ± .128 



£ E(X t - /if 



8-1 U 1.96V . 22 « 1 ± .92 



8-3 (a) 
(b) 
(c) 

8-5 69 



8-21 .39 db 2.58 1(^21) (M^ 
yj 1000 300 

= .390 ± .080 



8-23 — < tr 2 < — i.e., 15.2 < a 2 < 658 
3.12 .072 



8-25 4 ± 2.776^52/4^1/3 + 1/3 " 

* ; = 4 ± 8.17 

8-27 (7^ - tt 2 ) = (.142 - .114) 

/(.142)(.858) , (.114)(.886) 
± 1.96 /- ~ " + 



9 ± 1.96^136/60 = 9 ±2.96 
Factor of 4, so that « — 240. 

:522 

± 1. >8(4/Vfi)0) = 69 ± .784 



2500 ' 2500 

i.e., ttj - tt 2 - .028 ± .018 

An .028 ± .018 

Thus — = — 

TT .142 

i.e., relative decline = 20% ± 13% 
Although the best guess for the decline is 
one-fifth, when sampling fluctuation is 



one-mtn, wnen sam^nug nuwv«"~» 
8-7 Sirpposmg a and /* are both unknown, a u owe d for, with 95% confidence we can 



use * i 2.77)5/^5. 



8-9 2 ± 2.048V750/28V1/10 + 1/20 

= 2 ± 4.1 



only say that the relative decline was 
between 7% and 33%. 



8-13 4^20 d- 1.96V (.482)(.518)/10,000 
-.4820 ± .0098 



15 .04 < TT < .49 



8 _ 17 (a) >19 ^ 2 ± 1.96V (.199)(.801)/2500 
-.1992 ± .0157 

1^92 ± .98/V2500 

= .1992 ± .0196 

(f>) No. The closer P is to .5, the 
closer approximation will (8-21) 
be. 



9-1 I, II 

P - .50 ^ 
9-3 (a) Reject // 0 iff -j— — > .67 

TotT 

i.e.,P> .533 

(b) About 25% of the students will 
make erroneous rejections of // 0 . 

(c) .085 
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9-5 (a) Reject H 0 iff 

Y g 5 

- = > 1.645 i.e., X > 8.74 

Vl/49 

Since X = 8.8, reject // 0 . (It would 
be more accurate to use the / 
critical value of 1.68.) 

(b) 




9-7 (a) Prob-value = .04 (z = 1.77) i.e., 
if the claim (S6600) is true, the 
chance of getting a sample as 
extreme as this (S6730) is only 4 %. 

(b) Yes. 

(c) 1 would not reject. However, if 
possible, T would avoid accepting 
H 0 , in order to avoid the risk of 
a type II error. 

9-9 .007 (z = 2.48) 

9-11 Reject H 0 (z = 2.34 and prob-value 
= .010). 

9-13 (a) Three answers: 

(i) Using the normal approxima- 
tion, (which is very rough), 
reject H 0 if 

P < .19 or P > .81 

(ii) Using Fig. 8-4, (which is also 
rough), reject H 0 if 

P < .14 or P > .86 
(iii) Since P is very discrete 
(tenths), it is better to use the 
binomial Table II. It is seen 
that a 5% test is not possible. 



The best that can be done is a 
2.16% test: 

Reject H 0 if P = 0, .1, .9, or 1.0, 
i.e., if P <, .1 or P > .9 

(b) Reject H 0 if 

P < .402 or P > .598 

Again, because of the discrete 
nature of P, it would be better to 
state the answer: 

Reject H 0 if 

P <, .40 or P > .60 

Then a is found by continuity 
correction to be 5.7% (z = 1.90). 



9-15 (a) reject (/ = 8.2) 
accept (/ = .21) 
' reject (/ = 2.60) 

(b) 5726 < /* < 6334 

Therefore reject, accept, reject. 



9-17 (a) 12,100 < p 

(b) 18.4% 

(c) You cannot reject H ()y for either 
reason (a) or (b). 



9-19 (a) .080 ± .043 

(b) prob-value <.001 (z = 3.7) 

(c) The sample difference is statisti- 
cally significant at the 5% level. 

(d) The sociological significance of the 
difference in populations is a rela- 
tive matter. 



9-21 (a) 1 ± .64 

(b) .002 (z = 3.08) 

(c) Yes 
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10-1 (a) 



>ejeci 



i 



F — 



■==. 3.68 



exceeds / 025 = 2.45. Therefore 



x 24.5 



44/6 



13.36 



^05 = 5.99. 



Therefore 



(b) 



exceeds 
rejec 



jSincd / 2 = Fand / 2 025 - , 05 
! we s£e in this particular case that 



the test using / gives exactly the 
■samejconclusion as the test using F. 
| Mathematicians have proved in 
Igenejal that whenever Fhas 1 and 
k df and t has k df then t 2 and 
F have exactly the same distribu- 
tion' 



(c) 



7 ± 



1.45 



10-5 F 4 



V44/6V2/4 = 7 ± 4.7 



50(114/18) 



(329/3) 



2.89 



whi-ch |alls short of F 05 = 3.06. 
Therefore do not reject H 0 . 



10-7 (a) 



hour factor; 
27 



10/12 



- 32.4 



exceeds F 0 
reject 

Man factor: 
16/3 



6.94. Therefore 



F = 



falls 



= 6.4 



10/12 
short of F 



05 



6,94. There- 



fore do not reject H 0 . 



(b) For hour factor, the confidence 
allowance is ±2.77 for the follow- 
ing differences in ja : 











1 


2 3 


1 




-3* 3* 


4 




6* 


3 







10-9 95 % confidence interval : 

" A* Y = 6 ± 2.77V8/5 
= 6 ± 3.5 
.'. reject H Q at 5% level. 



11-1 (a) £ 



2.6 

760 + -, 



= 760 + .144 y 
or - -396 + A4AY 
(b) # = S760 s» estimate of savings 
of the average person. 
a Q = $—396 = estimate of sav- 
ings of a person with zero income. 
However, this is extrapolating 
recklessly. 



11-3 (a) .068 bushel (all units are "per 
acre"). 

(b) Not economical (net return = 
13.64 - 25*). 

(c) 13.6 C 



11-5 (a) S(a 0 , b) = JiY, - a 0 - bX^ 

( b)^ == »2 2(n-«o"^ i ) = 0 
oa 0 
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(c) a Q = —396, b = .144, as before 

(d) The method in the text is easier 
than the method in this problem. 



(c) 878 

(d) s 2 = 7863/2 = 3931. 

(e) Two degrees of freedom for s 2, 
are almost too few. It would be 
better to collect more data. This 
scarcity of data is even more 
acute in Problem 13-3. >■ 



12-1 (a) p=^±3.18 

= .144 ± .148 
15.4 



/ .0388 
18 



/.0388 



= .856 ± .148 

Note that p* = 1 — p, and the error 
allowances for p* and p are the same. 



12-3 t 



2.6/18 



3.11 



V.0388/18 
which falls short of t 01 = 4.54. There- 
fore do not reject H Q . 

12-5 It is preferable to observe i in a period 
of wide fluctuation. 



13-3 (a) (1) 

(2) 

(3) 
(4) 



760 « a 
2.6 = \ %b - 18c 



-6.3 
-.0017 



Mid 

-186 + 144c 
+ .024c/ 

-.007£ + ,024c 
4^.000010^ 



(b) a = 760 
b = .1054 
c = -.0242 
</ = -38.1 



13-5 (a) 36.7 

(b) 25.5, which is much better. 



13-1 (a) 5 = 760 + M5y - .029 w 

(b) Coefficient of y is .115, which is 
less than the former value, .144, 
The multiple correlation coeffi- 
cient is the proper measure of 
"the relation of S to Y, other 
things being equal." The simple 
correlation coefficient measures 
the relation of S to Y, taking no 
account of W, In fact, Wis (neg- 
atively) correlated with both S 
and Y, and thereby produces a 
misleadingly high correlation be- 
tween S and Y themselves. 



269 5? 5 
13-7 (a) S = — --Z-( T -7.5) 

= 33.6 - L2S(T - 7.5) 

(b) There is serious bias caused by 
the fact that we started at a 
seasonal high (Christmas), so 
that of course the time trend is 
downwards. 



13-9 P = -8 ± -2.45V48/2 
= -8 ± 12.0 



13-11 Make p is better by .38 mpg. 
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14-1 (a) 24/V(44)(34) - .62 
(tj) -|9 < p < .95 
(d) No 



(a) 35 2 /(iOO)(20) - .62 

(b) .3 8| 

(c 1 ) fi 4,7 while F 05 = 10.13 
/ i 2.2 while* a« = 3.18 



P± .35 ± .51 



Fo 
no 



(e) alone 



can 



any of these 3 reasons, do 
reject H 0 . 



is false, and should be: 
b < 1, no strict conclusion 
be drawn about b^ 



(a) .874 

(b) .9^2 
(b) .9^2 

(cl) .7$, .984 

(e) R 2 > r 2 necessarily 



) r sr-W 



(a) X 

(b) .o: 

(c) i,: 

W) 28 
e) - 



6 
2 

5.3 



a) .1, .4, .5 

b) .28, .44, .28 



15-3 (a) tf x 

(b) a, 

(c) flf 8 

(d) False. It should be: 

"Action a x is best when 'rain* is 
predicted, and also when no 
prediction is possible. Action <7 3 
is best when 'shine' is predicted. 
Action a 2 is never best!" 

15-5 (a) Midrange, median, mean, mode, 
(b) Correct. 

:i 

15-7 (a) Mode, 73 or 74 

(b) Median, 73 to 74 

(c) Mean, 73.5 



15-9 Closer to 20, because the data are 
twice as reliable. 



15-11 (a) is less believable, because it puts 
complete faith in a very small 
and unreliable sample. 



15-13 (a) 103.54, a = .33, § = .020 

(b) 113.12, a - .05, p = '.195 
Average loss increases by a factor 
of 3.16. 

(c) r 0 /r 1 = 4/1, which is unreason- 
able. 



15-15 (a) You do not want to play, because 
the value of the game is 10/8 to 
me, which I could win by using 
the strategy mix : H played 5/8 of 
the time, T played 3/8. 

(b) Each play //and T equally often, 
which results in a zero payoff. I 
would secretly choose my penny 
only if my opponent was also 
secretly choosing and seemed 
easy to outwit. 



J. 



Glossary of Important Symbols 



Symbol 
(a) English 
a 

ANOVA 
b 



C 

c 2 

d.f. 

e 
E 
E 

E(X) 
F 

*i 
iff 

L{) 

MLE 

MSD 



Meaning 

Letters 

estimated regression intercept 
analysis of variance 
estimated regression slope 
bias 

lumber of columns in analysis of 
variance, or 

estimated regression coefficient 

Constant coefficient in a contrast 

modified chi-square variable 

degrees of freedom 

regression error 

(also F, G, etc.) « event 

rot E 

expected value of X ~ n x 
variance ratio 

null hypothesis 
a Iternate hypothesis 
if and only if 
livelihood function 

aximum likelihood estimate(tion) 
mean squared deviation 
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Definition or Other 
Important Reference 

(11-7), (11-13) 
Table 10-6 

(11-7), (11-16), (12-13) 
(7-12) 

(10-27) 

(13-3) 

(10-22) 

(8-23) 

(8-11) 

(12-3), (12-4) 
(3-6) 
(3-17) 
(4-17b) 

(10-7), (10-17), (10-28), 
(14-14) 

(9-1) 

(9-2) 

(7-24), (12-48) 
Table 7-2 
(2-5), (7-13) 
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Symbol 

MSS 

n 
N 

N( , ) 
P 

Pr (E) 
Pr (EjF) 
p(x) 

V) 



r 2 

r XY.Z 

R 

4 
s 

ss 
t 

T 

var 
W 
X 



Meaning 

mean sum of squares 
sample size 
population size 

normal distribution, with specified 
mean and variance 

sample proportion 

probability of event E 

conditional probability of £, given F 

probability function of X 

joint probability function of X and Y 

conditioned probability function of X, 
given Y = y 

simple correlation, or 

number of rows in analysis of variance 

coefficient of determination 

partial correlation of Xand Y y if Z 
were held constant 

multiple correlation, or 
decision rule 

variance of sample, or 
residual variance in regression 

pooled variance of samples 

sample sum 

sum of squares, or variation 

student's t variable 

time 

variance = a 2 
weighted sum 

(also Y, V, W, etc.) = random 
variable, or 

regressor in original form 

(also 2/, u, etc.) = value of X, or 
regressor in terms of deviations from 
the mean 

sample mean of X (note this is a 
different usage than E) 



Definition or Other 
Important Reference 

Table 10-6 

(6-17) 

(6-17) 

(6-31) 

(1-2), (6-28) 

(3-D 

(3-22) 

(5-5) 

(5-2b) 

(5-10) 

(14-4), (14-16) 

(10-27) 

(14-29) 

(14-39), (14-40) 

(14-43), (14-44) 
(9-3) 

(2-6) 
(12-24) 

(8-16), (10-26) 
(5-16), (6-2) 
Table (10-6) 
(8-10), (12-26) 
(13-24) 
(5-32) 
(5-30) 

(4-1), (4-2) 
(11-4) 
(Fig. 4-2) 

01-5) 
(2-D, (6-9) 



Symbol 



(b) Gre|ek L 
a 



8 
f* 

TT 

n 

Pxy 

a 

<7 2 



(c) Other 
E u F 
E r\F 
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Meaning 

realized) value of X. After Chapter 8 
this distinction between capital and 
little letters is forgotten 

standard normal variable, or 
a second regressor 



Definition or Other 
Important Reference 



(7-9) 

(4-13), (8-9) 
Table 13-1 



;tters are generally reserved for population parameters as follows: 

probability of type I error, or 
population regression intercept 

probability of type II error, or 
population regression slope 

population regression coefficient 



(948) 
(12-3) 

(9-9) 
(12-3) 



any population parameter 
sample estimator of 6 
population mean 
population proportion 
product of 

population correlation of A" and Y 
population standard deviation 
population variance 
population covariance of X and Y 
sum of 



(13-1) 
(7-11) 
(7-11) 

(4-3), (4-10), (4-l7a) 

(1-2), (4-7), (6-20) 

(7-30) 

(H-3) 

(4-4) 

(4_4) ? (4-5), (4-19) 
(5-21), (5-22), (5-23) 
Table 2-2 



VI ATHEM ATICAL SYMBOLS 

E or i\ or both 
Eand F 



(3-10) 
(3-11) 
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equals, by definition (2-la) 

approximately equals (2- lb) 

is distributed as (6-31) 



(d) Greek Alphabet 







JLtf it? 113/1 








Letters 


Names 


Equivalent 


Letters 


Names 


Equivalent 




Alpha 


a 


rN i' 


IN u 


n 


Bp 


Beta 


b 


H£ 


Xi 


X 


Ty 


Gamma 


g 


Oo 


O micron 


o 


AS 


Delta 


d 


n 77 


Pi 


p 


Ee 


Epsilon 


e 




Rho 


r 


Z'C 


Zeta 


z 




Sigma 


s 


H/y 


Eta 




TV 


Tau 


t 


96 


Theta 




Tt» 


Upsilon 


u or y 


1/ 


Iota 


i 


<J><£ 


Phi 




Kk 


Kappa 


k 




Chi 




AX 


Lambda 


1 




Psi 




MjLl 


Mu 


m 




Omega 





Index 



Absolut^ dev 
Alter 

Analysis or c 
Analysi 



ations, 18, 224 
natle hypothesis, 168 



confidence 



hypot 
inter a 
one ff ctor 



compare 
sum o 
table 

two factors! 
variation 
Average] see 



ovariance, 278 
of variance (ANOVA), 195 
assumption's, 202 

jintervals, 206, 216 
lesisjest, 196, 213 



:tion; 215 
! 195 



regn ion, 'applied to, 298 
refl to, 195, 278 
f squares, 204 
201, fell, 299 



] 211 

204; see also Variation 
Mean; Measures of location 



estimation, 
by interval, 



Bayesian! methods, 312 

classical mejthod compared, 324, 332 

confidence Intervals, 327, 329 
cost, 33 1 
critique, 331 
decisions, 315 

Mi 

327, 329 
and'MLti, 332 
game theory compared, 349 
hypothesis tjssts, 333 
large sample, 328 
likelihood r|tio test, 336 
loss function, 315, 318, 323 
prior and posterior probability, 312 
strength, 331 
subjective nature, 331 
utility "unction, 319 
weakness, 331 
Bayes' theorem, 44, 312 



Bell, Daniel, 162 

Bernoulli population, 119, 125 

mean and variance, 120 
Best linear unbiased estimator, 240 
Bias, 134 

in regression, if some variable is 
ignored, 273 

of sample MSD, 135 

in sampling, 6 

see also Unbiasedness 
Binomial distribution, 59 

coefficients, table, 362 

cumulative table, 365 

mean, 121 

normal approximation, 121 
sample sum, as a, 120 
table, 363 
trial, 59 
variance, 121 
Bivariate distribution, 78 
normal, 292, 301 

;: 

C ' statistic, 164, 368 
Centers, 12 

Central limit theorem, 113 

for binomial, 121 

for regression, 241 
Chi-square variable, modified, 164 

table, 368 
Classical versus Bayesian estimation, 

324, 339 
Coding. 22 

Collinearity, see Multicollinearity 
Complementary event, 35 
Composite hypothesis, 175, 182 
Confidence interval, acceptable hypothe- 
ses, as set of, 2, 191, 216 
in analysis of variance, 205, 216 
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Confidence interval (cont.) 
Bayesian, 327. 329 
for correlation, 293 
for difference in several means, 206, 
216 

for difference in two means, large 
sample, 150 
small sample, 155, 205 
for difference in two proportions, 161 
example. 2 

hypothesis test, relation to, 187 

for mean, large sample, 129, 132 
small sample, 152 

meaning of, 2, 131 

for means, several, see Confidence 
interval, for difference 

one-sided type, 190 

for proportion, large sample, 2, 157 
small sample, 158 

for proportions, difference, 161 

random interval, as a, 131 

for regression coefficients, multiple, 
266 
simple, 244 

for variance, 163 
Consistency, 137, 148 
Continuity correction, 121 
Continuous distributions, 63 
Continuous variable, 9 
Contrast of means, 207, 216 
Controlled experiments, 237 
Correlation, 285 

assumptions, 291 

calculation, 288 

confidence interval, 293 

covariance, compared to, 286 

hypothesis test, 305 

independence, relation to, 91 

interpretation, 286, 291, 300 

multiple, 310 

partial, 306, 308 

population, 285 

regression, compared to, 285, 296, 301, 

305 
sample, 286 
simple, 285 
Counted data, see Bernoulli population; 

Binomial distribution 
Counter variable, 120, 157, 270; see also 
Dummy variable regression 



Covariance, 88, 286 

and independence, 91 
Criteria for fitting a line, 223 
Critical point in hypothesis testing, 168 
Cross section information, 269 

Decision theory. 312 
Deduction. 3, 106 
Degrees of freedom, 154 

in analysis of variance, 199 

in multiple regression, 259, 273, 311 

in simple regression, 243 

in single sample, 154 

in two samples, 156 
Density function, 64 

Dependence, statistical, see Independence 
Destructive testing, 5 
Deviations, 17 

Difference in means and proportions, see 
Confidence Interval, for difference 
Discrete variable, 8, 52 
Distribution, see Probability functions 
Dummy variable regression, 269 

and analysis of covariance, 279 

and ANOVA, 278 

compared to moving average, 277 

for seasonal adjustment, 274 

Efficiency, 136 

economic and statistical equivalence 
of, 137 

of MLE, asymptotically, 148 
of sample mean and median, 137 
Error, confidence interval allowance, 1, 
129; see also Confidence interval 
in hypothesis testing, 169 
in regression model, 236 
residual, after fitting, in ANOVA, 215 
in regresson, 243, 275, 297 
Estimate, interval, see Confidence 
interval 
point, 128 
Bayesian, 322 

Bayesian versus classical, 324 
estimator, compared to, 132 
and loss function, 323 
properties of, 134 
Estimating (least-squares) equations, in 
multiple regression, 259 
in simple regression, 227 



INDEX 



399 



Events, 3 1 ' 

independent, 45 

intersection of, 34 

mutually exclusive, 34 

union of, 33 
Expectation, see Expected value; Mean 
Expected value, definition, 74 

ofj a function of random variables, 73, 

| 8 J 

of a game, 345 
or a linear combination, 93 
Jf a loss, 316 

;jf a simple mean, 108, 117 
of a sample variance, 135 
^f a stun, 86, 93, 106 
'fee also Mean 
Extrapolation dangers in regression, 249 

FjStatiiic, ANOVA use, 199, 204, 213 

distribution, 201 
1 regression use, 299, 3 1 1 
j relation to /, 209, 300, 311 

table, 370 
isher^R. A., 153 

itted (predicted) value, in ANOVA, 
I 215 

, in regression, 223, 245, 303 
)FraseV, D. A. S., 241 

Frequency, 8; see also Relative frequency 
Functions, of one random variable, 72 
of'two random variables, 84 

Garrle theory, 340 

Bayesian solution, compared to, 349 
conservative, as too, 348 
loss (payoff) function, 341 
rninimax and maximin, 342, 347 
rjature as opponent, 348 
Saddle point, 342 
strategies, dominated, 347 
| mixed, 344, 347 
| pure, 340 

jstrictly determined games, 340 
Gaussian distribution, see Normal 

J variable 
Gauss-Markov theorem, 240 
dlossary of symbols, 393 
dossett, W. S., 153 



istogram, 1 1 
oel, P., 113 



Huff, Darrell, 7 
Hypothesis test, 167 

in ANOVA, 196,213 

Bayesian, 333, 339 

composite versus simple, 175, 182 

for correlation, 305 

confidence interval, relation to, 2, 187, 
216 

critical point, 168 
errors of type I and II, 169, 170 
in multiple regression, 266 
one-sided, 168, 190 
power, 170, 176 
prob-value, 179 
in regression, 245, 299 
for seasonal influence, 276 
two-sided, 185, 187 
see also Confidence interval; Null 
hypothesis 

Independence, statistical, 45 

covariance, relation to, 91 

of events, 45 

of variables, 83 
Induction and inference, 1,3; see a/so 

Confidence interval 
Interpolating in regression, 247 
Interval estimate, see Confidence interval 
Isoprobability ellipses, 293 

Joint distribution, see Bivariate distribu- 
tion 

Law of large numbers, 49 
Least squares in regression, 225 

attractive properties, 225 

calculations, 228 

coefficients, 229 

equations, 227, 259 

in multiple regression, 257 
Likelihood function, 143, 250 
Likelihood ratio test, Bayesian, 336 
Lindgren, B. W., 149,331 
Linear combination, contrast of means, 
207 

of random variables, 93 
regression slope, 239 
Linear transformation, of a normal 
variable, 70 
of observations, 19 
of random variables, 58, 93 
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Logarithms, 375 

Loss function, 315, 318, 323, 341 
McDonald, J., 7 

Maximum Likelihood Estimates (MLE), 
141 

as Bayesian estimates, 332 
of binomial w, HI, 142, 144 
geometric interpretation, 147, 251 
large sample properties, 148 
least squares, equivalence to, 250, 253, 
254 

of mean (normal), 142, 145 
versus method of moments, 148 
in multiple regression, 257 
of parameter in general, 142, 147 
of proportion, 141, 142, 144 
in regression, 250 
small sample deficiencies, 333 
Mean, of Bernoulli population, 120 
of binomial, 121 
conditional, 83 

confidence interval for, see Confidence 

interval 
of linear combination, 58, 93 
MLE, 145 

of population 56, 66, 103 
posterior, 328 

of random variable, 56, 66, 29 
of regression coefficients, 238 
of sample proportion, 122 
of sample sum, 106 
of sum, 86, 93 

see also Expected value; Sample mean 
Mean squared error, 137 

and consistency, 138 

related to bias and variance, 138 
Mean squared deviation (MSD), 18 

bias, 135 
Mean sum of squares, 203 
Measures of location, 12 
Measures of spread, 17 
Median of sample, 12 

as Bayesian estimator, 323 

efficiency, 137 

unbiased estimator, 136 
Minimax and maximin, 342, 347 
Mode, 12 

as Bayesian estimator, 323 

as MLE estimator, 332 



Moments, 16, 19 

method of moments estimation, 148 

see also Mean; Variance 
Monte Carlo, 140 
MulticoUinearity of regressors, 260 

in partial correlation, 310 

treatment, 264 
Multiple comparisons, 206, 216, 281 
Multiple correlation, 310 

and last regressor, 311 

and regression, 310 

and variation, 310 
Multiple regression, 255 

ANOVA, relation to, 255, 278, 283 

bias reduced, 273 

calculations, 258 

confidence intervals, 266 

error reduced, 237 

estimating equations, 259 

hypothesis tests, 266 

interpretation, 265 

least squares estimation, 257 

mathematical model, 256 

partial correlation, relation to, 308 

see also Regression 

Nonparametric statistics, 241 
Nonsense correlations, 305 
Normal equations, 227, 259 
Normal variable, Z, 66 

approximation to Binomial, 121 

distribution, 67 

t, relation to, 153, 155 

table, 367 
Notation, glossary of symbols, 393 

for mean, 73 

for random variables, 52, 132 
for regressors, 225, 234 
switch, 154, 
Null hypothesis, 168 

danger in accepting, 178, 267 
danger in rejecting, 179 

Operating characteristics curve, 184 
Outcome set, 30 

Parameters of population, 128 

glossary, 395 
Partial correlation, 308 

assumptions, 309 
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computation, 309 

regression,: relation to, 308 
Partition of sample space, 35 
Payoff matrix, 341 
Point estimate, see Estimate, point 
Poisson. distribution, 121 
Pooled lariafice, 156, 199 
Population, 102 

mean and variance, 56, 66, 103 
Posterior mejin, 328 
Posterior probabilities, 45, 312, 326 
Posterior variance, 328 
Power of hypothesis test, 170, 176 
Prediction interval in regression, 245 
Prior information in regression, 267 
Prior probab lities, 45, 312, 331 

and posterior probabilities, 314, 326, 

i 29 J 
Probability, 27 

axionjiatie,'49 

conditional, 40 

personal, 50, 330, 331 

relative frequency, as limit of, 27, 41 

subjective,^), 330, 331 

symmetric 48 

Probability density function, 66 



Probability functions (distributions), 
ial,59 
h 

, 81, 104, 314 
,63 



r^inon 

bivanatc, 

condition a 
.i 

continuous 



discrete, 52 
joint, 78 ] 
marginal, 80, 104 
normal, 67 
personal, 330, 331 
posterior, 312, 326 
prior! 312, 331 
Prob-valuc of a test, 179 

relation tojsignificance level, 
es of; estimators, 134 



Proper! 



181 



Proportions, '122; see also Confidence 
interval; Relative frequency 

Random digit table, 360 
Randodi nor rial numbers, 361 
Random sampling, 102 

Bernoulli, 119, 125 

definition, !6, 102 

examples, (l, 102, 103 



without replacement, 116, 124 

with replacement, 102, 124 

simulated, 26, 56, 105 

as subset of population, 128 

summary, 124 

see also Sampling 
Random variable, continuous, 63 

definition, 52 

derived, 72, 84 

discrete, 52 

function of, 72, 84 

regressor, 254 
Range of sample, 17 
Regression, 220, 234 

as ANOVA, 299 

assumptions, about dependent variable, 
235 

about error term, 236 
about independent variable, 254, 305 
bias, 273 

bivariate normal population, 301 
confidence intervals, 243 

for a, 244 

for js, 244 

for 7, etc., 266 
correlation, compared to, 285, 296, 

301,305 
error term, 236 

estimated coefficients, 229, 241 

fixed versus random independent 
variable, 254 

least squares estimation, 225 

likelihood function, 253, 254 

mathematical model, 234, 237 

model limitations, 249 

multiple, 255; see also Multiple re- 
gression 

nonlinear, 250 

parameters, 235 

prediction, 245, 303 

prediction interval, 245 

residuals, 237, 275, 297 

see also Multiple regression 
Regrets, 335 

Relative frequency, 9, 63, 103 
density, 64 

limit is probability, 27, 66 
Residuals, see Error; Variation 
Robustness, 163 
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Saddle point, 342 
Sample mean, 13, 107 

as Bayesian estimator, 323 

and central limit theorem, 113 

distribution, 109, 112 

efficiency, 137 

as estimator of fi, 128, 136 

expected value, 108, 117 

as Gauss-Markov estimator, 241 

as linear transformation of sample 
sum, 107 

normally distributed, 112 

variance, 108, 117 
Sample proportion, as sample mean, 122, 

125; see also Relative frequency 
Sample space, 30 
Sample sum, 105 

distribution, 109, 115 

mean, 106, 117 

variance, 106, 117 
Sample variance, see Variance 
Sampling, 102 

bias, 6 

methods, 5 

reasons, 5 

see also Random sampling 
SchefTe, H., 206 

Seasonal adjustment, using dummy 
variables, 274 
with moving average, 277 
Significance level of hypothesis test, 168, 
170 

ret at ion to prob-value, 181 
Sign test, 76 

Simple regression, see Regression 
Skewed distribution, 15 
Slonim, M. V., 7 

Small sample estimation, 152; see also 

Confidence interval 
Square root table, 351 
Standard deviation, 19; see also Variance 
Standardization, of normal variable, 70 

of random variable, 59 
Statistic definition, 8, 128 
Strategies, see Game theory 
Student's /, see t Statistic 
Sum of random variables, 85; see also 

Mean; Variance 
Sum of squares, see Variation 



Symmetric distribution, 15 
and Bayesian estimates, 327 

/ Statistic, 152 

distribution, 153 

F, relation to, 209, 300, 311 

normal, relation to, 153, 155 

table, 368 
Tables, 350 

Test of hypothesis, see Hypothesis test 
Time series, 269; see also Seasonal ad- 
justment 

Translation of axes, in correlation, 289 

in covariance, 88 

in regression, 225, 230 
Type I error, 169 
Type IT error, 170 

Unbiasedness, 134 

asymptotic, 138 

of random sample, 6 

of sample mean, 136 

of sample variance, 135 
Utility versus monetary loss, 319 

Variance, of Bernoulli population, 120 
of binomial, 121 
confidence interval, 163 
as covariance, 91 
explained, 204; see also Variation 
of linear eombination, 58, 95 
pooled, 156, 199 
of population, 56, 66, 103 
of posterior distribution, 328 
of random variable, 56, 66 
ratio, i 99; see also F statistic 
of regression coefficients, 238 
residual, 204, 237, 275; see also 

Variation, unexplained 
sample, single, 18, 135 
of sample mean, 108, 117 
of sample proportion, 122 
of sample sum, 106, 1 17 
of sum, 94 

unexplained, 204; see also Variation, 

unexplained 
Variation (explained, unexplained, and 

total), 203, 204,212,213,297 
unexplained, 205, 211,215 
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Venn diagram, 33 



Wallis, W 
Weighted 



A. 

avert 



ce. 15,94 



Wilks, S. S., 149 

Z variable, see Normal variable 
Zero-sum game, 341 



